SciELO - Scientific Electronic Library Online

 
vol.38 issue2The challenges of health situation analysis in ColombiaTuberculosis and comorbidities in urban areas in Argentina. A gender and age perspective author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Biomédica

Print version ISSN 0120-4157

Biomédica vol.38 no.2 Bogotá Jan./June 2018

https://doi.org/10.7705/biomedica.v38i0.3648 

ORIGINAL ARTICLE

Reporting of statistical regression analyses in Biomédica: A critical assessment review

Reporte estadístico en los análisis de regresión en Biomédica: una revisión y evaluación crítica

Julián Alfredo Fernández-Niño1 

Rosa Ivonne Hernández-Montes2 

Laura Andrea Rodríguez-Villamizar1 

1Departamento de Salud Pública, Facultad de Salud, Universidad Industrial de Santander, Bucaramanga, Colombia

2Escuela de Salud Pública de México, Instituto Nacional de Salud Pública, Cuernavaca, México


ABSTRACT

Introduction:

Regression modeling is a statistical method commonly used in health research, especially by observational studies.

Objective:

The objectives of this paper were to 1) determine the frequency of reporting of regression modeling in original biomedical and public health articles that were published in Biomédica between 2000 and 2017; 2) describe the parameters used in the statistical models, and 3) describe the quality of the information reported by the studies to explain the statistical analyses.

Materials and methods:

We conducted a critical assessment review of all original articles published in Biomédica between 2000 and 2017 that used regression models for the statistical analysis of the studies main objectives. We generated a 20-item checklist based on four good practice guidelines for the presentation of statistical methods.

Results:

Most of the studies were observational studies related to public health sciences (65.7%). Less than half (37.2%) of them reported using a combination of conceptual frameworks and statistical criteria for the selection of variables to be included in the regression model. Less than one quarter (22.1%) reported the verification of the assumptions of the model. The most frequently used uncertainty measure was the p-value (73.5%).

Conclusion:

There are significant limitations in the quality of the reports of statistical regression models, which reviewers and readers need in order to correctly assess and interpret the statistical models. The results, herein, are provided as an invitation to researchers, reviewers, and editors of biomedical journals to develop, promote, and control an appropriate culture for statistical analysis and reporting in Colombia.

Key words: Biostatistics; data analysis; regression analysis; bias (epidemiology); Colombia

RESUMEN

Introducción.

Los modelos de regresión son métodos estadísticos comúnmente utilizados en la investigación en salud, especialmente en estudios observacionales.

Objetivos.

Determinar la frecuencia de uso de modelos de regresión en los artículos originales de biomedicina y salud pública publicados en Biomédica entre 2000 y 2017, describir los parámetros utilizados en los modelos estadísticos, así como la calidad de la información reportada por los estudios para explicar el análisis estadístico.

Materiales y métodos.

Se hizo una revisión y evaluación crítica de todos los artículos originales publicados en la revista Biomédica entre 2000 y 2017 que utilizaron modelos de regresión en el análisis estadístico. Se construyó una lista de verificación de 20 ítems sobre la base de cuatro guías de buenas prácticas para la presentación de los métodos estadísticos.

Resultados.

La mayoría de los estudios incluidos eran estudios observacionales relacionados con las ciencias de la salud pública (65,7 %). En menos de la mitad (37,2 %) de ellos se informó sobre el uso de una combinación de marco conceptual y criterios estadísticos para la selección de las variables incluidas en el modelo de regresión; en menos de una cuarta parte (22,1 %) se informó de la verificación de los supuestos del modelo, y la medida de incertidumbre reportada con mayor frecuencia fue el valor de p (73,5 %).

Conclusión.

Hay limitaciones importantes en la calidad de los informes de los modelos de regresión estadísticos necesarios para la correcta evaluación y la interpretación de los modelos estadísticos por parte de los revisores y lectores. Los resultados se ofrecen como una invitación a investigadores, revisores y editores de revistas biomédicas a que promuevan el desarrollo de una cultura adecuada de análisis estadístico y presentación de informes en Colombia.

Palabras clave: bioestadística; análisis de datos; análisis de regresión; sesgo (epidemiología); Colombia

Evidence-based medicine and public health currently serve as the predominant paradigms in the provision of health services 1. Primary, secondary, and tertiary prevention interventions for specific diseases are largely based on clinical guidelines, while public health interventions to address health problems at a population level are largely based on proven successful programs 2. Both clinical guidelines and public health programs rely mostly on quantitative research that provides evidence of the efficacy, effectiveness, and efficiency of interventions, and indicate the strength of associations between risk factors and health outcomes. Contrary to the principles of evidence-based medicine, the quality of the evidence presented by health research is usually assessed exclusively according to the type of study design 3. This widely used approach assumes that the statistical methods related to a specific study design are unbiased and free of random error and misspecification.

Regression modeling is a statistical method commonly used in health research, and especially for observational studies. The selection of the type of regression model, the selection and inclusion of model variables, and the assessment of model diagnostics are some of the key actions that should be specified and properly conducted in order to obtain valid statistical results 4. The absence or misuse of an appropriate regression modeling technique could lead to erroneous results and conclusions, and while regression modeling plays a central role in quantitative health research, little attention has been given to the appropriateness of the statistical methods used by epidemiological studies to reach results and conclusions 5. In Colombia, the national bibliographic index Publindex has ranked Biomédica as a top journal in the health sciences field 6, and, therefore, a large amount of original health research performed in the country and the region has been published in this journal.

The objectives of this paper were to 1) determine the frequency with which regression modeling was reported in original biomedical and public health articles published in Biomédica between 2000 and 2017 (including online ahead of print); 2) describe the reporting of the parameters and procedures used for the statistical models, and 3) describe the quality of the information reported by the studies to explain and evaluate the statistical analyses. We also aimed at providing an overview of the statistical quality of regression modeling reported in the health literature published in Biomédica and based on those reports, identify the strengths and limitations of the statistical methods.

Materials and methods

We reviewed all original articles published in Biomédica between 2000 and 2017 (including online ahead of print publications in the 2016 and 2017 issues), which are available at the official Biomédica’s website (www.revistabiomedica.org). We then selected the articles that reported the use of at least one statistical regression model, according to the following selection criteria: 1) original articles that used one or more regression analyses; 2) regression analyses that were conducted in order to address the study’s main objective, and 3) regression analyses that were presented in the methods and results sections. When an original study included two or more regression analyses, we selected the regression model that was related to the study’s main objective. When a study had two or more main objectives, we selected one model per objective. Therefore, we included more than one regression model per original article when they corresponded to different main objectives and specifications.

We conducted a literature review to identify papers that assessed the quality of statistical reporting in biomedical journals and found four publications that presented good practice guidelines for the presentation of statistical methods 5,7-9. The first publication appeared in 1992 when the International Committee of Medical Journal Editors (ICMJE) issued a set of publication requirements for biomedical journals. Then, in order to better address the needs of readers the ICMJE proposed the first standards for the presentation of statistical information, aimed at supporting and informing authors and editors about the statistical principles behind the studies published in biomedical journals 7. The second article was published in 2013 and its purpose was to develop a checklist to determine the frequency of the use of regression models in economics, and the parameters and the amount of information reported. That checklist grouped items according to four consecutive stages of the statistical process 8. Later, the proposal titled “Strategies for the Development of Statistical Regression Models” described important points to consider when using a regression model, such as data manipulation, modeling strategies, final model evaluation, and presentation 9. Lastly, Lang and Altman recently published a general guideline for reporting methods and statistical analyses in biomedical journals, which included 12 items specific to regression analyses 5.

Some of the important items in one or two of the checklists mentioned were missing from the others, and none of the lists was comprehensive enough to assess the quality of the presentations of the statistical regression models. Therefore, we consolidated all of the items in the four publications relevant for the evaluation of regression modeling into one checklist containing 20 statistical procedures or parameters. These were evaluated based on the four different stages proposed by Kearns, et al.8: I) pre-statistical modeling considerations; II) specification of the final model; III) presentation of the final model; and IV) validation of the final model.

In general, the guidelines on good practices for statistical reporting 5,7,8, and the checklist based on these guidelines used for this critical assessment, can be summarized by the following standards: 1) agreement of the statistical method with the measurement scale and data structure of the study variables; 2) coherence between the statistical method and the study objective; 3) verification of the statistical model’s assumptions and understanding of the theoretical basis of the statistical tests used; 4) assessment of the diagnostics and goodness of fit of the final model according to the regression method used; 5) evaluation of the final model and comparing it to alternative models, and 6) discussion of the strengths and limitations of the final model.

The checklist used for this work was reviewed by two experts in statistical analyses, and then adapted based on their recommendations. Appendix 1 presents the full checklist and Appendix 2 compares the four checklists with the items included in the one used for this review. It is important to mention that we did not evaluate or validate the regression models themselves but only the reporting, according to each stage of the process.

Two independent reviewers with statistical and bio-medical experience performed a critical assessment of the 20 items included in the checklist. The reviewers were blinded to the authorship of the original articles. In addition, a third reviewer was assigned to assess three articles that were written by the other reviewers.

Most of the checklist items were dichotomized as “yes/no” responses and accompanying observations and explanations about the selections were included. A third reviewer resolved disagreements. We used the Cohen’s kappa statistic to assess agreement among the independent reviewers’ responses and we calculated a pooled (inter-rater) kappa for multiple items since the checklist contained several items. We present a descriptive analysis of the results using frequencies and grouped by six-year periods. We conducted the statistical analyses using Stata 12™ (Stata Corporation, College Station, TX, USA).

Results

We reviewed 755 original articles published in Biomédica during the study period. Regression analyses were used in 163 articles (21.6%) and after exclusions, 108 original studies and 113 regression models were included in the review (figure 1). Appendix 3 presents a full list of the articles included. The review process had an inter-rater agreement of 0.94.

Figure 1 Flow diagram of the process to select the studies included in this review 

Most of the studies were related to public health sciences (65.7%) and the remaining ones to biomedical sciences. In terms of study design, cross-sectional studies were the most frequent (38.9%), followed by cohort studies (23.2%). Table 1 presents the characteristics of the studies.

Table 1 Characteristics of the included original studies that used regression models by six-year periods, Biomédica, 2000-2017 

*Including online ahead-of-print publications for 2016 and 2017 issues

** Analysis of secondary and/or administrative sources without a specified design (more than two sources or information systems)

With regard to stage I, “pre-statistical modeling considerations”, 37.2% of the models reported using a combination of conceptual frameworks and statistical criteria to select the variables to be included in the regression models. With regard to the models that used statistical procedures or a combination of conceptual frameworks and statistical criteria to select the variables (n=49), the stepwise strategy was the most frequently (50%) reported. The logistic regression model was the most frequent type of model (56.6%), followed by the linear regression model (25.7%). A small proportion of studies (8.9%) mentioned “missing data” in their reporting of data quality. Table 2 presents the reporting of the statistical information related to the regression models according to checklist stages, items, and six-year periods.

Table 2 Information about regression models as reported by the original studies by six-year periods, Biomédica, 2000-2017 

* Including online ahead-of-print publications for 2016 and 2017 issues; ** For this table, the unit of analysis is the regression model; *** This item only applies to studies that used logistic regression: periods 2000-2005 (n=7), 2006-2011 (n=8), 2012-2017(n=14), and n=29 for the total of periods.

With respect to stage II, “specification of the final model”, 22.1% of the models reported the verification of the assumptions of the model with normality and linearity being the most and least frequent assumptions verified (65.0% and 12.0%, respectively). Model comparisons (9.7%) and model equations (8.0%) were the least frequent items reported by the studies.

With regard to stage III, “presentation of the final model”, 90.3% of the studies reported regression coefficients and the uncertainty measure reported with the highest frequency was the p value (73.5%), which increased over the six-year periods.

For stage IV, “validation of the final model”, 46% of the studies reported the goodness of fit of the model and 1.7% reported on tests for model performance. Regression models were discussed only in 15.9% of the articles.

In terms of the frequency of reporting by six-year periods, a decreasing pattern was found in the reporting of the following checklist items: Sufficient explanations for all variables used in the analysis; alpha parameter for variable inclusion, missing data; bias assessment for data quality; verification of the model’s assumptions, and goodness of fit measures. The discussion on the statistical models presented in the studies also decreased over time. In contrast, there was an increase in reporting on the combined use of conceptual frameworks and statistical criteria for the inclusion of variables in models of association.

Discussion

We conducted a critical assessment review of original studies that used statistical regression models and were published in Biomédica between2000 and 2017. The results show significant weak-nesses in the reporting of the statistical information or parameters needed by reviewers and readers in order to correctly assess and interpret statistical models. While the number of studies that use regression models has increased since 2000, the descriptive analysis which we performed by six-year periods showed that rather than having improved, the reporting of statistical information has worsened over time for a large number of key items in the checklist.

According to Publindex, Biomédica is currently the journal that has the highest impact on biomedicine, public health, and epidemiology in Colombia 6. And while Colombian research groups have published an increasing number of studies in international journals, Biomédica continues to be the preferred journal for publishing the results of research projects with national and regional impact, and its visibility has increased over recent decades 10.Therefore, Biomédica provides an appropriate framework to analyze the quality of statistical reporting in Colombia.

Previous reports have found weaknesses in the training in statistical competencies offered by postgraduate epidemiological and public health programs in Colombia 11. To some extent, the weakness in statistical skills at the postgraduate level might be related to the misuse and simplification of statistical methods, including generalized linear regression methods 12. This review used the available guidelines on good practice for statistical reporting to describe reports of regression methods that have appeared in Biomédica over the last 17 years since these are the most common statistical methods used in observational studies. The results of this review show that most of the included studies did not meet the standards of the guidelines and the checklist used herein to assess the quality of the reporting of statistical regression modeling methods.

Good practice guidelines for statistical reporting have been established mainly by expert consensus in order to provide guidance on how to improve the quality of reporting on statistical methods 5. However, these guidelines should not be assumed as a simplification of statistical methods or used as “recipes” to conduct statistical analyses. Strictly following the guidelines does not guarantee high quality regression analyses. Assessments of the quality of statistical analyses should be comprehensive, based on statistical rationale, and meet the statistical standards discussed above.

Most of the original articles included in this review are observational studies which used regression models to control for confounders 13. The results of the analysis of “pre-statistical model considerations”, or stage I, showed an increase over time in the use of combined methods to select regression model variables for studies of association. Nevertheless, statistical criteria based exclusively on hypothesis testing methods continue to be used. That type of statistical criteria, as well as stepwise methods, have been strongly criticized in recent decades 4,14 and modern epidemiology widely recognizes that model variables should be selected according to a more comprehensive analytical process, based on theoretical and literature reviews and using multidimensional criteria 15.

In statistics, a “misspecification error” refers to the use of incorrect procedures to build a regression model 16. The misspecification of a regression model could have an impact on the estimators obtained, thereby causing bias. The importance of this type of error has been well acknowledged 17,18 and may bias the study results. It includes omitting influential or including non-influential explanatory variables, using incorrect functional forms, violating the assumptions of the model, and using incorrect approaches to choose the final model 19. In this regard, it is important to note that most of the articles reviewed herein were weak in their reporting of the verification of model assumptions, which are key criteria for the statistical validity of the models.

Critical assessment reviews are significantly limited in their ability to distinguish between the absence of analysis versus a lack of reporting since quality is judged exclusively based on published documents. However, even when statistical methods are used, not reporting on them in itself weakens an original article in terms of demonstrating the validity of its methods and results. In the case of Biomédica, there are no constraints on the complete reporting of statistical methods since there is no word limit for original articles.

One important limitation of the present review is the use of the new checklist. Although two experts in statistical analyses reviewed it, it has not been widely reviewed by the scientific statistics community and, therefore, it cannot be considered a validated instrument. In addition, since the check-list assigned equal weight to all items, it does not take into account that some of the elements used to judge the quality of reporting may be more important than others. On the other hand, its strength is that it is a comprehensive list based on previous guidelines and checklists that have been specifically used to assess the quality of reporting on statistical regression modeling analyses.

In conclusion, this critical assessment review of original articles published over the last 17 years in a top biomedical journal in Colombia shows that there are important limitations in the quality of reporting on statistical regression models. An improvement over time in the use of multidimensional statistical procedures for the selection of model variables was evident. However, there was a lack of reporting on important statistical information related to the four stages of analysis, thereby creating uncertainty about the validity of the statistical models, which in turn calls into question study results and conclusions.

Finally, we invite researchers, reviewers, and editors of biomedical journals in Colombia to use the results of this critical assessment review to develop, promote, and control an appropriate culture for statistical analysis and reporting. Attaining this goal through the use of good practice guidelines for statistical reporting will lead to a more rational and efficient use of statistical methods, higher confidence and transparency in the peer-review process, and overall, higher quality biomedical, public health, and epidemiological studies in Colombia.

References

1. Jenicek M. Epidemiology, evidenced-based medicine, and evidence-based public health. J Epidemiol. 1997;7:187-97. https://doi.org/10.2188/jea.7.187Links ]

2. Tang JL, Griffiths S. Review paper: epidemiology, evidence-based medicine, and public health. Asia Pac J Public Health. 2009;21:244-51. https://doi.org/10.1177/1010539509335516Links ]

3. Burns PB, Rohrich RJ, Chung KC. The levels of evidence and their role in evidence-based medicine. Plast Reconstr Surg. 2011;128:305-10. https://doi.org/10.1097/PRS.0b013e318219c171Links ]

4. Ryan T. Modern Regression Methods. 2nd ed. Hoboken, N.J.: Wiley; 2009. p. 672. [ Links ]

5. Lang T, Altman D. Basic statistical reporting for articles published in clinical medical journals: the SAMPL guidelines. In: Smart P, Maisonneuve H, Polderman A, editors. Science Editors' Handbook.London: European Association of Science Editors; 2013. [ Links ]

6. Departamento Administrativo Nacional de Ciencia y Tecnología COLCIENCIAS. Publindex. Fecha de consulta: 12 de octubre de 2016. Disponible en: http://scienti.col-ciencias.gov.co:8084/publindex/Links ]

7. Bailar J, Mosteller F. La información estadística que deben proporcionar los artículos publicados en revistas médicas. Salud Públ Méx. 1992;34:103-15. [ Links ]

8. Kearns B, Ara R, Wailoo A, Manca A, Alava MH, Abrams K, et al. Good practice guidelines for the use of statistical regression models in economic evaluations. Pharmaco-economics. 2013;31:643-52.https://doi.org/10.1007/s40273-013-0069-yLinks ]

9. Núñez E, Steyerberg E, Núñez J. Estrategias para la elaboración de modelos estadísticos de regresión. Rev Esp Cardiol. 2011;64:501-7.https://doi.org/10.1016/j.recesp. 2011.01.019Links ]

10. Gómez LA, Nicholls RS. Avances en visibilidad, acceso y reconocimiento internacional de la revista Biomédica. Biomédica. 2008;28:5-6. [ Links ]

11. Idrovo AJ, Fernandez-Nino JA, Bojorquez-Chapela I, Ruiz-Rodriguez M, Agudelo CA, Pacheco OE, et al. Perception of epidemiological competencies by public health students in Mexico and Colombia during the influenza A (H1N1) epidemic. Rev Panam Salud Publica. 2011;30:361-9. https://doi.org/10.1590/S1020-49892011001000010Links ]

12. Fernandez-Nino JA, Trejo-Valdivia B. Costumbres, mal uso y abuso en estadística. Rev Univ Ind Santander Salud. 2016;48:5-6. [ Links ]

13. Lu CY. Observational studies: a review of study designs, challenges and strategies to reduce confounding. Int J Clin Pract. 2009;63:691-7. https://doi.org/10.1111/j.1742-1241.2009.02056.xLinks ]

14. Whittingham M, Stephens P, Bradbury R, Freckleton R. Why do we still use stepwise modelling in ecology and behaviour? J Anim Ecol. 2006;75:1182-9. https://doi.org/10.1111/j.1365-2656.2006.01141.xLinks ]

15. Walter S, Tiemeier H. Variable selection: current practice in epidemiological studies. Eur J Epidemiol. 2009;24:733-6. https://doi.org/10.1007/s10654-009-9411-2Links ]

16. Goff L. The bias from misspecification of control variables as linear.2014. Fecha de consulta: 12 de octubre de 2016. Disponible en: http://www.rff.org/files/sharepoint/WorkImages/Download/RFF-DP-14-41.pdfLinks ]

17. Rao P. Some notes on misspecification in multiple regressions. The American Statistician. 1971;25:37-9. [ Links ]

18. Ramsey JB. Test for specification errors in classical linear least-squares regression analysis. The Journal of the Royal Society Series B (Methodological). 1969;31:350-71. [ Links ]

19. Hall S, Asteriou D. Applied Econometrics. 2nd edition. New York: Palgrave Macmillan; 2011. [ Links ]

Funding This work was financed by the Departamento de Salud Pública, Facultad de Salud, Universidad Industrial de Santander, and the Instituto Nacional de Salud Pública de México.

Author’s contributions: All the authors participated in all the phases of the study.

Appendix 1

Appendix 1. Checklist of the dimensions and/or statistical procedures reported

Stage I “Pre-statistical modelling considerations”

  • 1. Have the objectives of the analysis been stated?

  • 2. What is the type of regression analysis?

  • 3. Is the sample size reported for every model presented?

  • 4. Are there sufficient explanations of all variables?

  • 5. What is the measurement scale of the response variable?

  • 6. Is the transformation of variables presented or reported?

  • 7. Is there any mention of the process for selecting the variables included in the final model presented? If so, what was this process?

  • 8. Does the paper mentionany statistical strategy for variable selection?

  • 9. Does the paper use astepwise method? If so,forward or backward?

  • 10. Does the paper mention any alpha criteria for including variables?

  • 11. Has the quality of data (missing values, outliers, possible bias, etc.) been described?

  • 12. Was the statistical package or program used in the analysis identified?

Stage II “Specification of the final model”

  • 13. Have any modeling assumptions been stated?

  • 14. Which assumptions were verified?

  • 15. Was colinearity evaluated?

  • 16. Does the paper mention the strategy for assessing assumptions?

  • 17. Did the study compare models? If so, how?

  • 18. Does the paper report the regression equation?

Stage III. “Presentation of the final model”

  • 19. Are the results reported graphically and/or in tables?

  • 20. Are the regression coefficients (beta) reported for each explanatory variable? Are the corresponding confidence intervals and/or p-values presented?

Stage IV “Validation of the final model”

  • 21. Does the paper conduct a goodness of fit analysis?

  • 22. Are the tests of goodness of fit mentioned?

  • 23. Does the analysis include a test of the model’s performance?

  • 24. Are the statistical analyses discussed?

Appendix 2

Appendix 2. Checklist of the dimensions and/or statistical procedures reported Comparison of guidelines/checklists for presenting statistical regression analyses from the literature reviewed for the comprehensive checklist used by this analysis 

*The items in bold in each guideline/checklist were the ones related specifically to the regression models used to generate this paper’s checklist. For these items, the item number corresponding to this paper’s checklist is indicated in parentheses and italics. The gray items were not included in this paper’s checklist as they were general guidelines related to the study design and not specifically related to the presentation of regression models. Some items were excluded because they were related to terminology (Nuñez’ item 2) or advanced statistical methods (Nuñez’ item 3) used in economics but not commonly used in public health or biomedicine. The items excluded from the Lang and Altman’s principles corresponded to a detailed explanation of the treatment of missing values, an aspect that was already included in this paper’s checklist as a presence/absence item

Appendix 3

Appendix 3. Studies included in the review 

Received: October 18, 2016; Accepted: August 08, 2017

*Corresponding author: Laura Andrea Rodríguez-Villamizar, Escuela de Medicina, Facultad de Salud, Universidad Industrial de Santander, Carrera 32 N° 29-31, oficina 301, Bucaramanga, Colombia Telephone: (577) 634 4000; extension 3195 laurovi@uis.edu.co

Conflicts of interest

The authors declare that they do not have any conflicts of interest

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License