Introduction
Inflammatory bowel disease (IBD) is a group of pathologies that include ulcerative colitis (UC) and Crohn’s disease (CD), which have similar manifestation patterns, but their differences allow classification1,2. CD is characterized by transmural and fistulizing involvement, affecting the entire gastrointestinal tract and the perineal region, while UC presents with mucosal compromise limited to the colon only1.
The recent guidelines of the European Crohn’s and Colitis Organization (ECCO) for diagnosing UC and CD describe no established “gold standard” but suggest diagnosis through clinical, laboratory, imaging, endoscopic, and histopathological findings2. The use of genetic and serological tests is not recommended2.
Currently, the diagnosis and monitoring of IBD is mainly based on the direct evaluation of the mucosa on endoscopic studies, which provide information on the extent and severity of the lesions and possible complications3,4. However, this method can hardly achieve periodic disease monitoring given its high cost, limited availability, and invasive nature3,4. Fecal calprotectin (FC) is widely available, easy to use, affordable, and currently the best-characterized biomarker in IBD.
Multiple studies have shown that FC is a reliable marker that evaluates the presence or absence of endoscopic activity and severity4. It is superior to C-reactive protein (CRP) and other fecal biomarkers3. Still, no consensus exists on the evidence for using FC or its diagnostic validity.
This paper will note the limitations of the current diagnostic strategy and the importance of defining a more accessible diagnostic and follow-up method. It will also show the availability of biomarkers. One of them is FC, a non-invasive diagnostic aid that would distinguish IBD from functional pathologies and, in turn, identify relapses in both CD and UC. Thus, for this research, we opted for the systematic literature review e since it allows us to identify, condense, and evaluate the current information about the diagnostic accuracy of FC in adult patients with IBD.
Materials and methods
The research design is a systematic literature review (SLR) validating diagnostic testing using the PICOT question strategy. The SLR thoroughly followed the recommendations of the PRISMA checklist.
A comprehensive and systematic search strategy was devised to identify available and relevant studies. We used MeSH and DeCS terms in the different databases: PubMed, Scopus, Science Direct, OVID, Cochrane Library, Scielo, Web of Science, and Virtual Health Library. No language restrictions were applied.
We made search records and exported the results of the searches to the Rayyan software, in which the articles were selected by title and abstract. In case of disagreements, the two researchers made the selection by consensus. Duplicates were discarded using the same software, and subsequently, we created an Excel matrix with the selected articles to include/exclude those meeting the inclusion/exclusion criteria.
The research included all studies available in full text conducted between 1992 and July 2022 and published in English, Portuguese, and Spanish that evaluated FC as a diagnostic method in adults with an established diagnosis of IBD by another diagnostic method. Conversely, we discarded studies that included patients diagnosed with another pathology that alters FC or performed in animals and papers with incomplete data that did not have the variables for data analysis.
With the group of articles rigorously selected by the title and abstracts, we continued reading the full text to evaluate its eligibility and, thus, obtain the studies for the synthesis of the information and define the level of evidence with the help of the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool.
To evaluate the methodological quality of the diagnostic method studies, we employed the QUADAS-2 checklist for diagnostic accuracy studies using the four domains: patient selection, index test, reference standard, flow and timing, and their relevant applicability5. This tool is fully available on the website, was adapted for our type of study, and was applied by both researchers. This tool is designed to assess the quality of primary diagnostic accuracy studies, but not to replace the review data extraction process, and should be used in addition to primary data extraction.
Sensitivity, specificity, and positive and negative predictive values were determined to measure diagnostic efficacy. Besides, measurements such as the cutoff point and area under the curve were considered in the studies that allowed it6. After verifying the quality of the information, we organized and documented the research selected for comparison by characteristics, design, population, sample, and study conditions and thus created a matrix with the evidence.
The SLR is an information synthesis study, i.e., a study of studies that do not have individuals (neither human beings nor animals) as their object of study. Nonetheless, this SLR was evaluated by the Universidad de Caldas ethics committee, obtaining the respective endorsement. In addition, an attempt was made to reduce biases and thus avoid improper manipulation of information.
Results
The initial search for selection yielded 352,843 articles published mainly in PubMed, followed by Scopus, Science Direct, Cochrane Library, OVID, and Web of Science, and in Spanish, LILACS and Scielo. Due to the large number of search results, we performed a first filter by title, resulting in 7,584 articles. Then, using the Rayyan software (Intelligent Systematic Review), 2,196 duplicates were detected and discarded. A total of 5,388 papers were reviewed by title and abstract with double-masked dynamics, mainly rejecting studies with animals and pediatric and obstetric populations. We excluded studies that evaluated the treatment, not diagnostic accuracy, and those that involved serum calprotectin. Articles available in English, Spanish, and Portuguese were selected only. Finally, 221 articles were chosen and thoroughly reviewed. This process is summarized in Figure 1.
At this point, we reviewed the full-text articles, verifying that they evaluated the diagnostic accuracy of FC, that the study population did not have another pathology that could affect the results of FC, and that they had all the data to allow the evaluation of accuracy diagnostic. Therefore, 19 articles written mainly in English were included in the SLR.
The final result of the SLR, presented in Table 1, included 18 articles written mainly in English between 2004 and 2019, especially in 2018, with five articles, the year in which the largest number of publications occurred. The most frequent study design was prospective cohort-type (61.1%), followed by retrospective cohort and cases and controls (15.8% each), and one cross-sectional observational study (5.5%).
Table prepared by the authors.
The majority of studies were conducted in a population with a final or presumptive diagnosis of IBD. Furthermore, in one of the articles, the population studied was patients who presented with gastrointestinal symptoms suggestive of IBD, and healthy people were found as a control group in two of the articles.
The studies were carried out in the adult population, as indicated in the inclusion criteria; however, two articles did not include the age of the participants. Five of the 18 studies chosen did not differentiate the sex of the participants, while in eight studies (44.4%), male participants predominated, and in the remaining 26.1%, females predominated.
In all articles, the exclusion criteria were that the study subjects had not consumed non-steroidal anti-inflammatory drugs or antibiotics during the three months before enrollment, did not suffer from concomitant severe diseases, were not pregnant, and did not use alcohol.
Of the total number of studies, 14 studied clinical activity to determine the risk of relapse, two analyzed FC to distinguish between IBD and an organic disorder such as irritable bowel syndrome (IBS), one evaluated both problems, and another used FC to assess progression in both UC and CD.
In 94.4% of the articles, the diagnosis of IBD was made through endoscopic studies such as total colonoscopy or sigmoidoscopy, and only one used an indium white blood cell (WCS) scan for this purpose.
The Montreal classification, used mainly in UC, was employed to classify IBD progression. This scale was used in three studies where the primary phenotype was inflammatory (B1), followed by stenosing (B2). The Mayo scoring, also validated to categorize clinical activity in UC, was used in 11 of the 18 studies, in which relapse or active disease was concluded with a score greater than 2; in one study, it was established from 4 points, and seven studies applied the Crohn’s disease activity index (CDAI), which documented 150 mg/g as a cutoff point for clinical activity or, otherwise, remission. The simple endoscopic score for Crohn’s disease (SES-CD), the Harvey-Bradshaw index, in which only five CDAI variables are used, and the Truelove-Witts index in UC were also used in the articles to define flares.
FC was used as a diagnostic method in all articles, as specified in the inclusion criteria; 13 of the articles reported that the feces were frozen at -20 ºC to be later processed, one article mentioned between -2 and -4 ºC, and in another, the samples were frozen at -80 ºC. In contrast, three articles did not refer to their sample-taking and handling protocol. Six articles showed that the sample processing method was enzyme-linked immunosorbent assay, four studies were processed with qualitative test assay method, and three studies with fluorescence enzyme immunoassay.
In all the selected articles, summarized in Table 2, the ROC curve was used to determine the best cutoff value of the FC; however, there is no consensus on the cutoff points, which range from 48.5 to 710 μg/g. These values significantly varied in sensitivity, from 70% to 100%, while the specificity values found were more heterogeneous: from 50% with a cutoff point of 15 μg/g to 100%. The positive (LR+) and negative (LR-) likelihood ratios were calculated for all the papers, concluding that, according to the LR+, in four articles, the FC allows confirming the disease with high certainty and managed to have a very low LR- that is highly relevant to rule out disease. Still, three articles referred to these values with poor relevance to confirm and rule out the pathology.
Table prepared by the authors.
To evaluate the methodological quality of the diagnostic methods studies, the QUADAS-2 checklist for diagnostic accuracy studies was used as a questionnaire with yes/no questions that classified the domains as having high or low risk34. This tool is fully available on the website, was adapted for our type of study, and was applied by both researchers.
With the data organized in the Excel matrix, as shown in Table 3, we could note that, in Domain 1 (patient selection), there are 13 studies with a low risk of bias. In Domain 2, there is a higher frequency of concern about an increased risk of bias in six studies. In contrast, in Domain 4 (which focuses on the bias that the flow and timing of patients could introduce), there are four studies with high risk, and in Domain 3, all articles have low risk for bias. In short, eight studies have heightened concern about introducing biases; however, the vast majority show little concern. Applicability, for its part, is only found with high concern in one study concerning the index test, and the remaining 94.7% with low concern.
# | Domain 1 | Domain 2 | Domain 3 | Domain 4 | |||
---|---|---|---|---|---|---|---|
Patient selection | Applicability | Index test | Applicability | Reference standard | Applicability | Flow and timing | |
1 | High | Low | High | Low | Low | Low | High |
2 | Low | Low | High | Low | Low | Low | High |
3 | High | Low | High | Low | Low | Low | High |
4 | High | Low | Low | Low | Low | Low | Low |
5 | Low | Low | High | Low | Low | Low | Low |
6 | High | Low | High | Low | Low | Low | Low |
7 | Low | Low | Low | Low | Low | Low | Low |
8 | Low | Low | Low | Low | Low | Low | Low |
9 | High | Low | Low | Low | Low | Low | Low |
10 | Low | Low | Low | Low | Low | Low | Low |
11 | Low | Low | Low | Low | Low | Low | Low |
12 | Low | Low | High | High | Low | Low | High |
13 | Low | Low | Low | Low | Low | Low | Low |
14 | Low | Low | Low | Low | Low | Low | Low |
15 | Low | Low | Low | Low | Low | Low | Low |
16 | Low | Low | Low | Low | Low | Low | Low |
17 | Low | Low | Low | Low | Low | Low | Low |
18 | Low | Low | Low | Low | Low | Low | Low |
Table prepared by the authors.
For clarification, this evaluation should not be used for “quality scoring” since it is a methodology focused on the risk of bias and applicability7. If a study is considered “high” or “low” in one or more domains, then it may be regarded as “at risk of bias” or with “concerns regarding applicability”8. As a recommendation, a methodology yielding a summary quality score was not used because the interpretation of the score could be problematic and potentially misleading9.
Discussion
This research was an SLR that evaluated the quality of scientific evidence regarding the diagnostic efficacy of FC in adult patients with IBD and its ability to distinguish between functional and organic intestinal disorders, such as IBS and IBD, and its respective clinical activity to define relapse or remission.
Previous studies have pointed out the importance of conducting a rigorous and extensive literature search to ensure the reliability of the SLR. An SLR was completed in 2007 by Gisbert et al., whose bibliographic search was carried out only in Medline10. It is estimated that approximately only 60% of the available literature is found in this database, compared to our study, in which an exhaustive search was performed in multiple databases (Cochrane Library, PubMed, Scopus, Science Direct, OVID, Scielo, Web of Science, and Virtual Health Library), thus guaranteeing the greatest possible coverage of the subject matter.
Moreover, most articles included in this review have a low risk of bias. The study by Orcajo-Castelán, in which several methodologies were used to evaluate diagnostic accuracy, concluded that no scientific publications are free of biases, but there are procedures to reduce them11.
In a diagnostic test accuracy SLR by Hosseini et al. in 2022, QUADAS-2, an exclusive tool for diagnostic accuracy studies, was also used. Both in our research and in the review by Hosseini et al., this method was operated independently by the authors, with the difference that a third author, who evaluated the discrepancies between the two principal authors, was included in such SLR. It should be noted that they also classified the biases as low, moderate, and high risk. In contrast, only high and low risk were included in our review, according to the domains and their applicability12.
Notably, the majority of articles found were prepared in 2018. Despite not having clarity about this phenomenon, it is inferred that this responded to the increase in the incidence of IBD and its recognition, as in previous years, this pathology was misclassified or underdiagnosed.
In 1992, Roseth et al. developed the first method for determining FC using an enzymatic adsorption assay (ELISA)13. Since then, the method has been extensively improved and validated, and tiny stool samples have been used14. However, literature that meets the validity requirements for diagnostic accuracy has been found only since 2004, possibly due to the diagnostic method’s technological advances and dissemination for routine clinical practice.
The main epidemiological design used in diagnostic accuracy studies is prospective, followed by retrospective. Research conducted in 2010 by pediatric gastroenterologists showed that all the studies evaluated had a prospective epidemiological design and included consecutive outpatients with suspected IBD15. In such an SLR, a smaller sample than the one in this work was analyzed, with the difference being that they included six articles with adults and six with a pediatric population15.
FC is an indirect indicator of the state of the intestinal mucosa. To date, several meta-analyses have shown that it is helpful to discriminate IBD from other diseases, mainly organic, and predict the relapse of IBD patients in remission by evaluating clinical activity with various indices16, which is also evident in the results of this research.
This SLR, together with information found in the literature, confirms that the level of FC is directly associated with the indices of clinical and endoscopic activity of IBD, with high sensitivity and specificity. Therefore, it is a valuable tool in clinical practice with benefits such as a reduction in invasive procedures, early diagnosis of relapse, and follow-up in remission because it is easy to perform, non-invasive, and relatively low-cost compared to colonoscopy. However, there has been no consensus to establish an optimal cutoff point for identifying organic versus functional disease or relapse15,17,18. The current data are still inconclusive regarding a cutoff level of FC as a predictor of clinical activity or remission, as values vary from 48.5 to 710 μg/g18,19.
An SLR in 2013 showed that most studies evaluating FC used ELISA mechanisms, and most manufacturers recommended 50 μg/g as a cutoff point17, as in the present study. Colonoscopy continues to be the primary reference standard18, considered the gold standard for evaluating inflammation of the intestinal mucosa, although it is an expensive and invasive procedure; hence, there is an interest in biomarkers such as FC, which can perform comparably to colonoscopy.
Conclusion
FC is a reliable surrogate marker of endoscopic activity in IBD and is especially useful in predicting endoscopic activity to aid the differentiation of functional from organic disease. Thus, it has the potential to be used as a diagnostic and monitoring biomarker in patients with IBD without ignoring the lack of consensus to delimit a cutoff point and improve applicability and diagnostic accuracy. Colonoscopy remains the gold standard in all studies.
So far, the evidence is based on prospective design studies with a low risk of bias and insufficient concern about their applicability. However, more studies are necessary to reach a consensus for decision-making in the clinical setting.