Services on Demand
Journal
Article
Indicators
- Cited by SciELO
- Access statistics
Related links
- Cited by Google
- Similars in SciELO
- Similars in Google
Share
Iatreia
Print version ISSN 0121-0793
Iatreia vol.22 no.3 Medellín July/Sept. 2009
ACTUALIZACIÓN CRÍTICA
Ronda clínica y epidemiológica
Óscar Osío Uribe1
1 Especialista en Medicina Interna, Magíster en Epidemiología Clínica, Director de la Corporación Académica de Patologías Tropicales, Universidad de Antioquia, Medellín, Colombia.Correspondencia: www.riesgoscardiovasculares.com Correspondencia: cpt_udea@yahoo.com
PRESENTACIÓN
En septiembre de 2009 se publicarán en la revista Hypertension los resultados del estudio Whitehall II. Este estudio británico validó el modelo predictivo para el desarrollo de hipertensión obtenido en el estudio Framingham, y prevemos que muchos clínicos tendrán dificultades para interpretarlo. Un grupo de investigadores dirigido por el profesor Mija Kivimäki* evaluó a 6.704 ciudadanos británicos entre 35 y 68 años (inicialmente normoglicémicos sin hipertensión ni enfermedad arterial coronaria) y chequeó la eficiencia del modelo pronóstico para la hipertensión incidente que había sido desarrollado varios años antes con el análisis de la progenie Framingham. Después de cuatro mediciones, con intervalos de cinco años, los investigadores escribieron el resumen derivado de esta investigación: 'Tanto la discriminación (C estadística: 0,80) y la calibración (Hosmer–Lemeshow X2: 11,5) del puntaje de Framingham para el riesgo de hipertensión fueron buenos. La concordancia entre las incidencias predicha y observada de hipertensión fue excelente a través de la distribución del puntaje de riesgo. La proporción global predicha:observada fue 1,08, ligeramente mejor en los mayores de 50 años (0,99 en hombres y 1,02 en mujeres) que en los más jóvenes (1,16 en hombres y 1,18 en mujeres). La reclasificación con un puntaje modificado sobre la base de nuestro estudio poblacional no mejoró la predicción (mejoría neta de la reclasificación: –0,5%; 95% CI: –2,5% a 1,5%). Estos datos sugieren que el puntaje de riesgo de hipertensión de Framingham es una herramienta válida para calcular el riesgo a corto plazo de desarrollar hipertensión'. La Ronda Clínica y Epidemiológica se enorgullece de presentar en este nú;mero de Iatreia los pensamientos del profesor Pablo Perel acerca de la importancia del pronóstico clínico para los médicos, los pacientes y los que planean la salud. En este artículo se expone de manera didáctica el uso práctico de estos modelos de predicción. Todo esto mejora nuestra capacidad de evaluar críticamente esta investigación y también mejorará nuestra capacidad de aplicar sus hallazgos a muchas áreas de la práctica clínica.
PRESENTATION
In September 2009, the results of the Whitehall II study will be published in the journal Hypertension. This British study validated the predictive model for the development of hypertension obtained in the Framingham study, and we anticipate that many clinicians will have difficultie with its interpretation. A group of researchers led by professor Mija Kivimäki* evaluated 6.704 British citizens aged between 35 and 68 years (initially normoglycemic without either hypertension or coronary artery disease) and tested the proficiency of the prognostic model for incident hypertension that had been developed several years before with the analysis of the Framingham progeny. After four measurements, with five–year intervals, the researchers have written the abstract derived from this investigation: 'Both discrimination (C statistic: 0.80) and calibration (Hosmer–Lemeshow X2: 11.5) of the Framingham hypertension risk score were good. Agreement between the predicted and observed hypertension incidences was excellent across the risk score distribution. The overall predicted:observed ratio was 1.08, slightly better among individuals older than 50 years (0.99 in men and 1.02 in women) than in the younger ones (1.16 in men and 1.18 in women). Reclassification with a modified score on the basis of our study population did not improve prediction (net reclassification improvement: – 0.5%; 95% CI: –2.5% to 1.5%). These data suggest that the Framingham hypertension risk score provides a valid tool to estimate near–term risk of developing hypertension'. The Clinical Epidemiological Round prides itself in presenting in this edition of Iatreia the thoughts of professor Pablo Perel about the importance of clinical prognosis for clinicians, patients and health planners. In his article, the practical use of these predictor models is exposed in a didactic manner. All this improves our ability to critically evaluate this piece of research and will also enhance our capacity to apply its findings to many areas of the clinical practice.
AN INTRODUCTION TO PROGNOSTIC MODELS
Pablo Perel1
DEFINITION OF PROGNOSIS
Prognosis, (from the Greek ' pro' meaning before and 'gnosis'meaning knowledge) is defined as 'the result of looking forward'.1 In the context of clinical epidemiology prognosis can be defined as ' the probable course and outcome of a health condition over time' or as ' the future risk of adverse outcomes among people with existing disease'.2,3
IMPORTANCE OF PROGNOSIS IN CLINICAL PRACTICE
Clinical practice involves three main activities: identifying diseases (diagnosis), treating diseases (therapy) and predicting diseases course and outcome (prognosis). Although the three activities are interrelated, distinctions between them are made in clinical research. Prognostic related research is considered to be the most neglected one.3,4
Prognosis was historically one of the most important activities of medical practice. Until the end of the nineteenth century, 10% of the content of medical textbooks was dedicated to prognosis; however, by 1970 this had decreased to almost zero.5 Predicting the future was what both priests and doctors were supposed to do for many centuries but the appearance of effective therapies has shifted the dominance of the clinical encounter to diagnosis and therapy.6 However, in most recent years there has been an increasing interest in prognosis research.3 Among the reasons for this resurgence, Christakis proposed:7
1. Interest in human terminal care and the decision of withdrawing or not life support from critical patients.
2. Avoidance of futile treatment for reasons of justice or costs.
3. Availability of new 'technologies' (e.g. genetic tests, and biomarkers).
4. Increasing emphasis on patient autonomy.
5. Increasing prevalence of chronic diseases.
CLASSIFICATION OF PROGNOSIS RESEARCH
Prognostic studies can be classified into two categories according to their objective: explanatory studies or outcome prediction studies.2
Explanatory studies focus on the casual association between predictors and outcome. Some authors propose a further division into three stages: phase 1, identifying associations; phase 2, testing independent associations; and phase 3, understanding prognostic pathways.2
Outcome prediction studies, also known as prognostic models, combine different variables to obtain a probability of the outcome. According to the use of the estimated probability, these studies can be further divided into studies which are used:8,9
1. To inform doctors to make decisions for individual patients.
2. To inform patients and relatives.
3. For research purposes (for example, in the selection of patients, adjustment for baseline imbalances, or risk stratification in clinical trials).
4. To compare health services by allowing adjustment for case mix.
Variables influencing prognosis (predictors) can be also classified into three categories according to their characteristic:3
a) Environment (e.g. country, social class, hospital care).
b) Host (e.g. age, comorbidities).
c) Disease (e.g. genes, severity).
Ver (Figura1)
PROGNOSTIC MODELS
Definition
Outcome prediction studies have received different names such as prognostic model's, prediction model's, risk scores, prognostic indices, clinical prediction rules, clinical prediction guides, or clinical decision rules.10,14 According to some authors the term 'clinical decision rules' only applies to those models that also provide a diagnostic or therapeutic recommendation.15
Throughout this article I will use the term ' prognostic model' defined as the 'mathematical combination of two or more patient or disease characteristics to predict outcome'.16
Performance of prognostic models
The performance of prognostic models refers to how 'accurate" the model's predictions are in relation to observed data.8 According to Rothman, ' accuracy in estimation implies that the parameter that is the object of measurement is estimated with little error'.17 In the particular context of prognostic models, accuracy has two main components: calibration and discrimination.18
Calibration refers to the agreement between predicted and observed probabilities.19 For example if, according to the model, traumatic brain injury (TBI) patients with certain characteristics have a probability of mortality of 30%, it would be expected that 30 out of 100 patients with those characteristics would die if the model was perfectly calibrated. Calibration can be measured in different ways; graphically by plotting observed against predicted outcomes, or through a statistical test such as the Hosmer– Lemeshow test. This test compares the observed number of people with events within risk groupings (e.g. deciles of risk) with the number predicted by the model. A small p value implies lack of fit.
Discrimination is a measure of how well a model separates those who develop the outcome from those who do not.19 It is generally measured through the area under the receiver operator curve (ROC) or the C statistic. A ROC is constructed by plotting pairs of true positive rate (sensitivity) and false positive rate (1–specificity) for several cut–off values of probability of the outcome. The area under the ROC can be interpreted as the probability that a randomly selected person with the outcome, will have a higher predicted probability than a randomly selected person without the outcome.20 For example, if a model has an area under the ROC (or C statistic) of 0.7, this means that the model will estimate a higher probability of the outcome for subjects with the outcome 70 out of 100 times if we choose a random pair of subjects with and without the outcome.
The relative importance of calibration and discrimination will depend on the intended application of the prognostic model.18 For example, for counselling an individual patient calibration of the model will be more relevant, while for triage in a setting with limited resources discrimination could be more important.
In addition to the measures of discrimination and calibration we might be interested in performance measures for specific thresholds when a clinically relevant cut–off is already established. The accuracy rate (or correct classification rate) is calculated as: (true positive + true negative)/total and, the complement that is the error rate (misclassification rate) that is defined as (false positive + false negative)/total. The problem with these measures is that equal weight is given to positive and negative results whereas, in general, false negatives are more important than false positives. Furthermore the accuracy rate will be high, by definition, for a frequent or infrequent outcome. For example, if the average mortality for a condition is 7% the accuracy rate would be 93% if the model classifies all the patients as survivors.19
More recently new measures have been proposed, such as the net reclassification improvement (NRI). The NRI has four components: proportion of individuals with events who move up or down a category and the proportion of individuals without events who move up or down a category. The NRI is obtained by combining the four components, but they should also be reported separately.21
Finally there are also overall performance measures such as the R2, which is the amount of explained variation on the outcome explained by the model, and the Brier score which is a measure of the difference between actual outcomes and prediction.
These measures do not distinguish among the different performance components, calibration and discrimination, so they are not very useful.19
Inaccuracy of clinical prediction
The lack of interest in prognosis has led to a weak medical training in this area and so it is not surprising that doctors feel poorly prepared and that they often disagree or are inaccurate in their predictions.7,22
There are numerous studies showing that physicians make errors when formulating a prognosis. In many of these studies the term accuracy is used in the more general epidemiological sense (measured with little error), and they do not necessarily use the standard specific measures of accuracy described above for evaluating prognostic models.
A systematic review compared physicians clinical predictions of survival in terminally ill cancer patients with actual survival.23 The authors found eight studies (including 1.563 individuals) and reported that the median clinical prediction of survival was 42 days and the actual median survival was 29, overall there was poor agreement (weighted kappa 0.36) between clinical prediction and actual survival.
A cohort study was conducted involving 16 Dutch nursing homes including 515 terminally ill non cancer patients. The authors compared physicians' predictions with actual survival. Physicians were asked to predict death in the following periods: one week (0 to 7 days), 8 to 21 days, or between 22 to 42 days. The positive predictive value of physicians' predictions was high for those patients expected to die within one week (92%), but much lower for patients who were expected to die within 8 to 21 days (16%), or within 22 to 42 days (13%).24
In other areas, such as cardiovascular disease, similar results have been reported. For example, Pignone and collaborators developed 12 primary prevention scenarios with a five year risk of cardiovascular heart disease events, and conducted a survey among 79 physicians to compare their predictions with values calculated from Framingham risk equations. For the analysis the authors divided the estimated risk by the Framingham estimated risk and considered results between 0.67 to 1.5 to be 'accurate'. They reported that only 24% of their predictions were accurate.25 The main limitation of this study was with the use of hypothetical cases, thus the predictions could not be compared with actual survival.
In a cohort study that included 850 patients admitted for intensive care, physicians' prediction was compared with actual survival at hospital discharge and approximately 70% of the patients that were estimated to have a 30% chance of survival actually survived. But unlike for cancer patients, doctors' predictions were in general pessimistic rather than optimistic.26
Clinical prediction versus prognostic models
According to studies in cognitive psychology the human brain is poorly prepared for making and updating precise quantitative prediction.27 Psychologists have been studying the question of clinical versus statistical prediction for more than 50 years.28 Since then the results have generally shown that prognostic models are as accurate as, or more accurate than, clinical judgment.
Grove and collaborators conducted a systematic review of studies that compare statistical versus clinical prediction. Studies from the area of psychology and medicine which predicted outcomes such as human behaviour, disease diagnosis, or a disease prognosis were included.29They used a 25 page manual to code each study for publication variables and study design characteristics. Investigators were trained and two coders extracted the data with very high reliability (r = .97). A total of 136 studies were included. The authors used the term accuracy referring to the error in the estimation of each of the methods in comparison with a gold standard. Different measures were reported in the studies so the authors standardized the different measures in a common metric (effect size–ES–). For this they first found a suitable transformation for each measure with a known variance and an approximate normal distribution, then they estimated the difference between the clinical and statistical prediction. Positive ES indicates superiority of statistical prediction. To conduct the meta–analysis they gave a weight to each ES that was inversely proportional to the variance. The weighted summary statistic for the ES was 0.086. This indicates that on average statistical prediction was approximately 10% more 'accurate' than clinical prediction. Because there was evidence of statistical heterogeneity (Qt = 1635.2 p <0.0001) the authors also reported the results using a different method. For this they considered that ES <–0.1 as substantially favouring clinical prediction, ES between –0.1 and 0.1 as being relatively equal, and those > 0.1 as substantially favouring statistical prediction. With these criteria in 46% of the studies the statistical prediction was more accurate than the clinical prediction, in 48% a similar result was obtained with both methods, and in only 6% of the studies clinical prediction was superior. The authors used meta–regression to evaluate the effect in certain subgroups, such as year of publication, study design or type of setting (general medicine, mental health, education, etc.) and concluded that they did not find any exception to the general equivalence or superiority of statistical prediction. However, it is not clear from the report whether the study had enough power to evaluate the effect in these different subgroups. Another limitation of this study was that the authors did not evaluate or discuss the possibility of reporting bias.
Evaluation of prognostic models
There are two main levels to evaluate prognostic models. First we want to know if the model performance, in terms of discrimination and calibration, works satisfactorily for patients other than those from whom the data were derived. This is called 'validation' of the model. The other level refers to the evaluation of the model in terms of change in behaviour of medical doctors (medical management) or changes in patient outcome. Some authors refer to this as the 'impact' of the model.
Several guidelines have been proposed for the development and evaluation of prognostic models. The most recent one was proposed by Reilly and collaborators, who defined five stages:
1) Derivation of the prognostic model: Identification of the predictors for multivariable model.
2) Narrow validation: Assessment of the accuracy of the prognostic model in one setting.
3) Broad validation: Assessment of the accuracy of the prognostic model in varied settings.
4) Narrow impact analysis of prognostic model used as decision rule: Prospective demonstration that the prognostic model improves physicians' decisions in one setting.
5) Broad impact analysis of the prognostic model used as decision rule: Prospective demonstration that the prognostic model improves physicians' decisions in varied settings.
According to Reilly and collaborators the two last stages (impact analysis) should be only applied to clinical decision rules (those prognostic models that recommend a diagnostic or therapeutic action according to the estimated probability), and they also consider that randomised controlled trials are the ideal study design for these two stages.15 Other authors consider that even prognostic models that do not provide a course of action should also be evaluated through randomised controlled trials, while for others their evaluation could be restricted to the validation stages.12
To the best of my knowledge, the only randomised clinical trial evaluating the use of a prognostic model (that does not provide a course of action) was the SUPPORT study (Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments). This study enrolled 8.329 adult seriously ill patients with a 50% chance of death within six months.30In a first phase including 4.301 patients a prognostic model to estimate 180 days mortality was developed and, in a second phase including 4.028 patients, the investigators randomly allocated half of the physicians to receive the prognostic model estimates and patient's preferences for end life care. In this study physician's and model's discrimination were identical (area under the receiving operator curve 0.78) but physicians' predictions were worse calibrated in comparison with the prognostic model. The best discrimination was obtained when combining both physicians and the prognostic model estimates (area under the receiver operator curve 0.82). The study did not find a difference in physician's performance nor in patients' outcomes.
However, other studies have found different results. Murray and collaborators studied 1.025 patients with severe TBI, with the objective of evaluating whether providing doctors with computer–based predictions influenced patient management.31 According to their previous hypothesis there was a decrease of 39% in the use of intensive management in patients with the worst prognosis, including osmotic diuretics, ventilation and intracranial pressure monitoring. Among the limitations of this study it should be mentioned that it was a before/ after design.
The results of the SUPPORT study were unexpected and discouraging for those advocating the use of prognostic models. However, these results do not necessarily mean that every prognostic model would be ineffective. Other studies, as the one mentioned by Murray and collaborators, showed different results and it can be argued that the impact of prognostic models would vary according to the context in which they are applied. Their impact will be determined not only by its accuracy but by the following contextual variables:
Users: How much doctors believe in the prognostic model and incorporate its prediction into their practice is of paramount importance. There is some evidence that models which are 'home grown' facilitate implementation. 32
Setting: In settings with scarce resources doctors will need to prioritise among patients and it is plausible that accurate prognostic information could be more useful.16
Condition: The impact on patient outcome is related to the evidence of the effectiveness of interventions according to baseline risk. For example the evidence for interventions according to risk in primary prevention in cardiology is well established, so prognostic estimates can be easily translated into treatment recommendations. 33
Taking into account the previous considerations, some authors argue that prognostic models should be developed to be accurate and their impact would vary according to the context where they are applied. As Kellett stated in a recent paper '...it is unlikely that (prognostic models) worsen clinical judgment. Therefore a good physician should no more refuse use them than a good driver should refuse to use his car's headlights at night'6
BIBLIOGRAPHIC REFERENCES
1. Windeler J. Prognosis – what does the clinician associate with this notion? Stat Med 2000; 19: 425–430. [ Links ]
2. Hayden JA, Cote P, Steenstra IA, Bombardier C. Identifying phases of investigation helps planning, appraising, and applying the results of explanatory prognosis studies. J Clin Epidemiol, 2008; 61: 552–560. [ Links ]
3. Hemingway H. Prognosis research: why is Dr. Lydgate still waiting? J Clin Epidemiol, 2006; 59: 1229–1238. [ Links ]
4. Christakis NA. Death foretold: prophecy and prognosis in medical care. Chicago IL: University of Chicago Press, 1999. [ Links ]
5. Christakis NA. The ellipsis of prognosis in modern medical thought. Soc Sci Med, 1997; 44: 301–315. [ Links ]
6. Kellett J. Prognostication—the lost skill of medicine. Eur J Intern Med, 2008; 19: 155–164. [ Links ]
7. Christakis NA, Iwashyna TJ. Attitude and self–reported practice regarding prognostication in a national sample of internists. Arch Intern Med, 1998; 158: 2389–2395. [ Links ]
8. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med, 2000; 19: 453–473. [ Links ]
9. Altman DG. Systematic reviews in health care: Systematic reviews of evaluations of prognostic variables. BMJ, 2001; 323: 224–228. [ Links ]
10. Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules. Applications and methodological standards. N Engl J Med, 1985; 313: 793–799. [ Links ]
11. Stiell IG, Wells GA. Methodological standards for the development of clinical decision rules in emergency medicine. Ann Emerg Med, 1999; 33: 437–447. [ Links ]
12. Wyatt JC, Altman DG. Commentary: Prognostic models: clinically useful or quickly forgotten? BMJ, 1995; 311: 1539– 1541. [ Links ]
13. Laupacis A, Sekar N, Stiell IG. Clinical prediction rules. A review and suggested modifications of methodological standards. JAMA, 1997; 277: 488–494. [ Links ]
14. Redelmeier DA, Lustig AJ. Prognostic indices in clinical practice. Jama 2001; 285: 3024–3025. [ Links ]
15. Reilly BM, Evans AT. Translating clinical research into clinical practice: impact of using prediction rules to make decisions. Ann Intern Med, 2006; 144: 201–209. [ Links ]
16. Rothwell PM. Prognostic models. Pract Neurol, 2008; 8: 242–253. [ Links ]
17. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. [ Links ]
18. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med, 1999; 130: 515–24. [ Links ]
19. Steyerberg EW. Clinical Prediction Models. New York: Springer, 2009. [ Links ]
20. Bartfay E, Bartfay WJ. Accuracy assessment of prediction in patient outcomes. J Eval Clin Pract, 2008; 14: 1–10. [ Links ]
21. McGeechan K, Macaskill P, Irwig L, Liew G, Wong TY. Assessing new biomarkers and predictive models for use in clinical practice: a clinician's guide. Arch Intern Med, 2008; 168: 2304–2310. [ Links ]
22. Poses RM, Bekes C, Copare FJ, Scott WE. The answer to 'What are my chances, doctor?' depends on whom is asked: prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med, 1989; 17: 827–833. [ Links ]
23. Glare P, Virik K, Jones M, Hudson M, Eychmuller S, Simes J, et al. A systematic review of physicians' survival predictions in terminally ill cancer patients. BMJ, 2003; 327: 195–198. [ Links ]
24. Brandt HE, Ooms ME, Ribbe MW, van der Wal G, Deliens L. Predicted survival vs. actual survival in terminally ill noncancer patients in Dutch nursing homes. J Pain Symptom Manage, 2006; 32: 560–566. [ Links ]
25. Pignone M, Phillips CJ, Elasy TA, Fernandez A. Physicians' ability to predict the risk of coronary heart disease. BMC Health Serv Res 2003; 3: 13. Disponible en http://www.biomedcentral.com/1472–6963/3/13 Consultado el 12 de agosto de 2009 [ Links ]
26. Knaus WA, Wagner DP, Lynn J. Short–term mortality predictions for critically ill hospitalized adults: science and ethics. Science, 1991; 254: 389–394. [ Links ]
27. Liao L, Mark DB. Clinical prediction models: are we building better mousetraps? J Am Coll Cardiol, 2003; 42: 851–853. [ Links ]
28. Grove WM. Clinical versus statistical prediction: the contribution of Paul E. Meehl. J Clin Psychol, 2005; 61: 1233–1243. [ Links ]
29. Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C. Clinical versus mechanical prediction: a meta–analysis. Psychol Assess, 2000; 12: 19–30. [ Links ]
30. Anonymous. A controlled trial to improve care for seriously ill hospitalized patients. The study to understand prognoses and preferences for outcomes and risks of treatments (SUPPORT). The SUPPORT Principal Investigators. Jama, 1995; 274: 1591–1598. [ Links ]
31. Murray GD, Murray LS, Barlow P, Teasdale GM, Jennett WB. Assessing the performance and clinical impact of a computerized prognostic system in severe head injury. Stat Med, 1986; 5: 403–410. [ Links ]
32. Garg AX, Adhikari NK, McDonald H, Rosas–Arellano MP, Devereaux PJ, Beyene J, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. Jama, 2005; 293: 1223–1238. [ Links ]
33. Jackson J. Primary prevention of cardiovascular disease: the absolute–risk–based approach. In: Lancet T, ed. Treating individuals: from randomised trials to personalised medicine, first edition, London, 2007 [ Links ]
NOTAS
* Kivimäki M, Batty GD, Singh-Manoux A, Ferrie JE, Tabak AG, Jokela M, et al. Validating the Framingham Hypertension Risk Score. Results From the Whitehall II Study. Published online before print July 13, 2009, doi: 10.1161/HYPERTENSIONAHA.109.132373Abstract disponible en http://hyper.ahajournals.org/cgi/content/abstract/HYPERTENSIONAHA.109.132373v1 Consultado el 12 de agosto de 2009
1 Nutrition and Public Health Interventions Research Unit Epidemiology and Population Health London School of Hygiene and Tropical Medicine United Kingdom