Introduction
The concept of papillary thyroid carcinoma (PTC) encompasses a group of epithelial neoplasms of malignant biological behavior that present evidence of histogenesis from the follicular epithelium of the thyroid gland and a set of distinctive nuclear features1. Regarding its incidence, when analyzing the statistics reported in the last 20 years by the American Cancer Society, there are two epidemiological phenomena. First, since 2000 and up to 2016, a 350% increase in the diagnosis of PTC was observed2; this is explained by the increase in the diagnosis of papillary microcarcinomas, a phenomenon attributable to the implementation of the Thyroid Imaging Reporting and Data System and the Bethesda System for Reporting Thyroid Cytopathology3. Second, since 2016 until now, a 39% decrease in the diagnosis of PTC was observed, from 3.8% to 2.3%4. One of the hypotheses that seems to explain this phenomenon is attributed to the recategorization of thyroid neoplasms in 2016 by the World Health Organization1, and particularly to the redefinition of “encapsulated, non-infiltrating follicular neoplasm with nuclear changes similar to papillary carcinoma”, as a neoplasm with benign biological behavior5.
Hashimoto disease (HD) corresponds to a spectrum of autoimmune thyroid disease, in which there are positive serum titers for antibodies against specific thyroid proteins, always associated with clinical evidence of thyroid disease6. It is considered the most frequent cause of hypothyroidism and hyperthyroidism in areas of the world with adequate and exaggerated iodine intake, respectively7. Its age of highest incidence is between the fifth and sixth decade of life, and it is considered a female disease, with a female:male ratio of at least 7:18. HD can be associated with microscopic chronic lymphocytic thyroiditis (CLT) -compatible changes, consisting of diffuse replacement of the thyroid parenchyma by a mononuclear infiltrate, formed by lymphocytes and plasmacytes grouped in lymphoid follicles with germinal centers, associated with a variable degree of follicular atrophy, squamous and oncocytic metaplasia9.
The first publication found in the literature reporting the frequency of PTC in specimens with CLT was published in 1952 by Lindsay et al, who documented a prevalence of 21%10. However, the hypothesis of a higher prevalence of PTC in specimens with CLT was first evaluated analytically by Dailey et al in 195511. Three methodological approaches suggesting a possible association between CLT and PTC, based on the retrospective study of surgical specimens, can be found in the literature. Two of these approaches use a descriptive approach and cross-sectional measurement to determine the prevalence of PTC in specimens with CLT, or vice versa. A limitation of these approaches is the absence of comparison groups, typical of the analytical approach, which prevents the implementation of association statistics. However, when this hypothesis is evaluated with a retrospective, cross-sectional and analytical methodology, like the one proposed by Dailey et al in which the prevalence of PTC in specimens with and without CLT-compatible changes is compared, an association seems to be suggested. Contradictorily, when this hypothesis is evaluated by analytical, prospective and cross-sectional methodological proposals, where ultrasound and cytological evaluation of nodular lesions is performed in patients with and without HD, independently of the presence or absence of changes compatible with CLT, in order to document the presence or absence of a PTC, it has not been possible to confirm the hypothesis suggested by retrospective studies, and controversially, suggest that there is no real increase in risk12-14.
There are five meta-analysis studies in the literature in which this hypothesis is evaluated. The results obtained from authors such as Singh et al15 with 11 studies up to 1999, Lee et al16 with 35 studies up to 2013, Lai et al17, and Resende et al18 with 47 studies up to 2017, suggest that there is a higher prevalence of PTC in specimens with CLT-compatible changes. However, Jankovic et al19 with 35 studies up to 2013, in addition to suggesting such an association in their results, propose that these may be masked by a probable selection bias. We consider that 14 of the 47 articles evaluated in these studies do not meet methodological requirements that allow their comparison: one of the articles corresponded to a cohort study20, three articles did not have surgical specimens as study objects, these consisted of prospective studies that performed ultrasound and cytological evaluation of patients with and without a diagnosis of HD and documented the suspicion or presumptive diagnosis of PTC in both groups21-23. Finally, ten articles, although they did have surgical specimens as objects of study, did not have a comparison group, and therefore, it was not possible to compare the prevalence of PTC in specimens with and without changes compatible with CLT24-33.
Because there are inconsistent results regarding the association between CLT and PTC, in addition to the fact that existing meta-analyses have included methodologically non-comparable studies without proposing clear sources of selection bias and that many other related studies have been published in recent years, we performed a comprehensive meta-analysis to investigate the possible associations of CLT and PTC.
Methods
Search strategy and inclusion criteria
The systematic review process was performed according to the parameters established in The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses34. The systematization was performed by implementing the Review Manager 5.3 software and was summarized according to the flowchart proposed by the PRISMA Statement35. The proposed methodology was included in the International Prospective Register of Systematic Reviews (PROSPERO) database of the Center for Reviews and Dissemination of the National Institute for Health Research under ID CRD42020168562.
The literature search was performed in the databases Excerpta Medica dataBASE - Embase - of the Elsevier Publishing House and Medline - PubMed Central - of the US National Library of Medicine. The search period was established from January 1, 1950 to December 31, 2020. The search criteria were defined by the following group of descriptors: (“Hashimoto Disease” OR “Hashimoto Thyroiditis” OR “Hashimoto” OR “Thyroiditis” OR “Chronic Lymphocytic Thyroiditis” OR “Lymphocytic Thyroiditis” OR “Chronic Thyroiditis”) AND (“Papillary Thyroid Carcinoma” OR “Papillary Carcinoma” OR “Thyroid Carcinoma”).
The study selection process was carried out in five stages, based on the following definitions: a study was considered as one that addressed the central theme when the study of the association of PTC in patients with CLT or HD was implicit in its title or in its structured abstract. The inclusion criteria were as follows: 1. The object of study had to be defined as a surgical specimen; 2. The methodology of the original articles had to be retrospective, cross-sectional and analytical; 3. The diagnosis of CLT had to be made with histological parameters, with or without confirmation by anti-thyroid antibodies; 4. The diagnosis of PTC should have been made according to temporal parameters and defined by the International Classification of Endocrine Tumors proposed by the World Health Organization; and 5. The main purpose of the study should have been, therefore, to compare the prevalence of PTC in two groups of surgical specimens, categorized according to the presence or absence of histological changes compatible with CLT.
Exclusion criteria
Once the concept of a study addressing the central theme and the inclusion criteria were defined, exclusion criteria were applied in the following phases. During phase I, duplicate studies were excluded; during phase II, studies that did not address the central theme were excluded; in phase III, articles were excluded, although they addressed the central theme, were not original studies and, if they were, were not retrospective, cross-sectional and analytical in methodology. In addition, during this phase, articles whose complete body text was not found were excluded. When there was no consensus on the application of a criteria, a researcher defined the exclusion or inclusion of that article during this phase. During phase IV, after the review of the complete body of the articles, those in which the total of the surgical specimens available during the study period were not included in the study population to be analyzed were excluded, since this would create an important selection bias.
Data collection process
Data collection was carried out independently by two researchers and tabulated in a pilot form. Subsequently, it was unified in a database by a third researcher, and in cases where there was a discrepancy, the data from the corresponding article were extracted by the latter. From each article, the number of events was collected as follows: Group 1, case with CLT-compatible changes in surgical specimen, in which a PTC was documented; Group 2, case without CLT-compatible changes in surgical specimen, in which a PTC was documented; Group 3, case with CLT-compatible changes in surgical specimen, in which a PTC was not documented; and Group 4, case without CLT-compatible changes in surgical specimen, in which a PTC was not documented. The variables were tabulated using Excel 2020 ® software.
Statistical analysis
For the statistical analysis, the Forest diagrams, the calculation of statistic heterogeneity and evaluation of the bias inherent to meta-analysis studies, the Review Manager ® 5.3 and SPSS ® 25.0 statistical packages were used.
For the descriptive analysis, the frequency of specimens with findings compatible with CLT, the frequency of specimens with findings compatible with PTC and the frequency of specimens with simultaneous diagnosis of PTC and CLT were calculated. For the bivariate inferential analysis, contingency tables were constructed, and the prevalence ratio (PR) and odds ratio (OR) were calculated for the diagnosis of PTC in specimens with and without findings compatible with CLT in each of the studies. The statistical significance of this association was calculated using Pearson’s Chi-square test. P values less than 0.05 were considered statistically significant.
Inferential analysis and statistical heterogeneity
Statistical heterogeneity was calculated using a random effects model, justified by the fact that it is not possible to rule out the presence of clinical heterogeneity in the studies. OR was used as the effect estimator. To evaluate statistical heterogeneity, the Cochran-Mantel-Haenszel or Q Test statistic was used, and the presence of heterogeneity was defined with a value of p<0.1. The I2 statistic was used to estimate the degree of variability in the estimate of the overall effect that is attributable to the heterogeneity of the studies in the analysis. The information and results corresponding to the stratified inferential analysis and the assessment of statistical heterogeneity were summarized using a Forest Diagram. Finally, funnel plots were used to assess the presence of bias attributable to the weight of small studies. To measure this, symmetry was visually assessed using a funnel plot.
Results
After exclusion of duplicate articles, the literature search strategy identified 3597 potentially eligible articles. A total of 36 articles met the inclusion criteria, that is, their main purpose was to compare the prevalence of PTC in two groups of surgical specimens, categorized according to the presence or absence of histological changes compatible with CLT; however, 11 articles were excluded in which, after a thorough review of their methodology, it was possible to conclude that not all the specimens available during the study period were included in the statistical analysis36-46; on the other hand, in three of the articles47-49, despite the inclusion of all the specimens available during the study period, after the application of the Chi-square test, the count of expected phenomena was lower than the minimum, which suggested that the sample assessed could be insufficient to perform an inferential analysis, which is why they were excluded. Finally, 22 articles50-71were included in which, in addition to meeting the inclusion criteria, it was possible to perform an inferential analysis, with the lowest possible source of selection bias (Figure 1).
The study population consisted of 63,548 surgical specimens. The average prevalence of CLT and PTC in surgical specimens was 15.5% and 22.9%, respectively. The average prevalence of concurrence between PTC and CLT in surgical specimens was 32.5%. Table 1 summarizes the results of the descriptive and inferential statistical analyses performed from the data extraction of each of the articles. The pooled OR, based on the studies, was 1.81 (95% CI: 1.51-2.21). However, there was significant heterogeneity between the distribution of PR and OR across studies (I2= 91%; p>0.00001) (Figure 2). The shape of the funnel plot of the studies included in the analysis appears to be symmetrical, indicating the absence of bias attributable to small studies (Figure 3).
CLT, Chronic lymphocytic thyroiditis
PTC, Papillary thyroid carcinoma
PR, Prevalence rate
(α) Prevalence as a function of the presence or absence of the independent
Discussion
The meta-analyses available in the literature, in which the authors attempt to gather evidence to suggest a causal association between CLT-compatible changes and the presence of PTC in surgical specimens resulting from thyroidectomies, have a common denominator: they suggest that the diagnosis of PTC is documented more frequently in specimens with CLT-compatible changes. It seems to be a trend that after the pooled estimation of this risk, there is significant heterogeneity among the studies, even demonstrating the absence of a bias attributable to small studies; therefore, this heterogeneity should be interpreted as a wake-up call for those of us who dedicate ourselves to study this association. The authors suggest that biases such as the high variability in the methodology of each of the studies, the absence of clear criteria for including the objects of study in the comparison groups and the absence of homogeneous criteria for indicating the surgical management of HD patients may be responsible for the heterogeneity. However, in the present meta-analysis, in which a high homogeneity is clear as far as the methodological characteristics of the included studies are concerned, we obtained results like those published in the literature. Therefore, it is possible to conclude that there are sources of bias that have not been considered in the last 40 years, and furthermore, retrospective studies, and their limitations, will not allow us to go beyond suggesting a possible association.
Among the methodological limitations it is important to mention that, despite the retrospective nature of the studies, none of them include a sample calculation, that is, they do not stipulate the minimum number of surgical specimens with and without CLT-compatible changes necessary to obtain results with an acceptable statistical power during the inferential analyses, and even in some series, which were excluded during the systematic review process, the authors do not give the reasons why they do not include all the specimens available during the period of time covered by the study. Therefore, we suggest that in prospective studies the sample size should be determined by groups, to compare the proportion of specimens with PTC between a group of specimens with CLT and a group of specimens without CLT. Since we cannot prove a causal association, the hypothesis of independence should be of a bilateral nature and the dependent variable, corresponding to the finding of PTC in surgical specimens, should be of a qualitative, nominal and dichotomous nature.
When analyzing the prevalence of CTP in all the articles included (Table 1), it is possible to observe that a Berkson bias may be present in some series. The factors responsible for this bias could be the following: it is striking that the prevalence of CTP in the studies by Siriwera et al, Buyukasik et al, Roberti et al, and Konturek et al are much lower than average, while in the studies by Zhang et al, Ye et al, and Giagourta et al, in contrast, they are much higher than average. This may mean that, possibly, the institutions of the first group do not have an important annual volume of thyroidectomies, and that, in contrast, the institutions of the second group are reference centers for the management of thyroid cancer, so that undoubtedly, in centers whose population characteristics are similar to these groups, the sample size should be even larger than usual and even be collected over a longer time interval, to avoid sources of bias.
However, despite the proper calculation of sample sizes, it is possible that there may be difficulties in obtaining a significant number of specimens with changes compatible with CLT during the study period. The results obtained in Table 1 allow us to assert that, on average, there is one specimen with changes compatible with CLT for every six specimens without changes compatible with CLT; this allows us to conclude that it is possible that this situation is configured as a selection bias, since patients who will have CLT in their specimen will be taken to surgery less frequently; therefore, situations such as the fact that changes compatible with CLT are more frequent in the group of specimens with PTC, that most of the patients with CLT will not undergo surgery in the development of the natural history of the disease, and that preoperative clinical suspicion of PTC might be, in the majority of cases, the reason that indicates their surgical management, may be the cause of the selection bias.
Conclusions
The current literature suggests that there is a higher risk of documenting a PTC in surgical specimens in which changes compatible with CLT are observed; however, there are sources of bias that will not be possible to control in retrospective studies, so we recommend studying the hypothesis that suggests a higher probability of diagnosing a PTC in specimens with changes compatible with CLT by means of prospective methodologies. For this, possible sources of bias should be controlled by performing a sample size calculation based on the prevalence of PTC in specimens with and without CLT for each of the clinical scenarios to ensure real statistical power. Finally, that the preoperative suspicion of a malignant neoplasm is, among others, the main reason indicating the performance of thyroidectomy in patients with or without clinical impression of an autoimmune thyroid disease, this fact will continue to behave as a constraint difficult to control and may be considered as the possible main source of bias on these studies.