RECENT ESTIMATES suggest that between 10 to 20 percent of children and adolescents in low- and- middle-income countries (LMICS) suffer from mental health problems (Erskine et al., 2017; Kieling et al., 2011). Despite this high prevalence, the mental health needs of children living in LMICS are often unattended for lack of funding, political indifference, or lack of qualified clinicians (Kieling et al., 2011). Children's mental health problems are particularly prevalent in conflict-affected countries, but these countries may, in turn, be less capable of monitoring their children's mental health needs (Dimitry, 2012). In this context, it is fundamental to develop efficient and reliable instruments to assess and monitor children's mental health to make the problem visible and to inform public policy efforts aimed at reducing it.
Colombia has suffered from more than 50 years of civil conflict, leaving more than 1.4 million children and adolescents as direct victims (Red Nacional de Información, 2018). Until 2015, the country did not have information about the national prevalence of mental health problems in children, despite evidence suggesting it was high in specific regions (OIM, UNICEF, & ICBF, 2013). In 2015, Colombia carried out its first nationally representative mental health survey for children aged seven to 11 years (ENSM, according to its acronym in Spanish; Ministerio de Salud & Colciencias, 2015), which included a 24-item scale to measure children's mental health symptoms. Subject-matter experts selected the 24 items included in the ENSM from three different existing scales that are briefly explained below (Rodriguez et al., 2016): the Reporting Questionnaire for Children (RQC; Giel et al., 1981), the Child Behavior Checklist (CBCL; Achenbach, 1999), and the Brief-Screening Diagnostic Questionnaire (CBTD according to its acronym in Spanish; Caraveo y Anduaga, 2007).
First, the RQC is a 10-item scale developed by the World Health Organization (WHO) to screen for significant degrees of emotional and behavioral disorder or psychotic disorders (Castro, Billick, & Swank, 2016). The target population for the RQC is children between the ages of five and 15. The main caregiver responded the questionnaire. Sample items include "Does the child wet or soil himself/ herself?" and "Does the child tend to be alone?. The RQC has been administered in several countries, including Iraq, Ethiopia, Sudan, Philippines, India, and Colombia (Giel et al., 1981). Previous evidence shows that the RQC has a similar predictive power of children's clinical disorders to that of the CBCL in Iraqui Kurdistan children (Ahmad et al., 2007). Second, the CBTD is a 27-item questionnaire for parents, which comprises 10-items taken from the RQC and additional items reflecting additional mental health problems symptoms. The CBTD is intended to characterize common mental health problems, as well as hyperactivity, sadness, attention deicits, impulsivity, antisocial behavior (Caraveo and Anduaga, 2007). The CBTD has been widely used in Mexico (Caraveo and Anduaga, 2007).
Lastly, the CBCL is a parent-report questionnaire to detect emotional and behavioral problems in children and adolescents (Achenbach, 1999). he instrument targets children between the ages of six and 18 and consists of 113 Likert-scale items (i.e., absent, occurs sometimes, and occurs often) that assess the presence of different symptoms of mental health problems in the past six months. Besides, the 113-item, the CBCL has eight sub-scales, intended to measure anxiety, depression, somatic complaints, thought problems, attention problems, rule-breaking behavior, and aggressive behavior (Achenbach & Ruffle, 2000). Sample items include "can't concentrate, can't pay attention for long" and "complains of loneliness." The CBCL has been used in more than 30 countries, showing adequate psychometric properties in North America, and samples from Asia, Africa, South America, the Caribbean, and Europe (Ivanova et al., 2007).
Even though the content validity of the 24-items included in the ENSM scale was discussed (see Ministerio de Salud & Colciencias, 2015), the psychometric properties of the overall scale, which comprises items from the RQC, CBCL, and CBTD, have not been systematically assessed previously.
The present study seeks to fill this gap by analyzing the mental health problems' scale dimensionality, reliability, and convergent validity. Moreover, this study seeks to analyze evidence on the information each item provides, to define whether a more efficient scale (i.e., shorter and with a high level of information) is feasible, which might facilitate its future implementation.
Even though Classical Test Theory (CTT) is the most widely used framework to analyze the psychometric properties of test scores, Item Response Theory (IRT) offers unique features to improve the efficiency of a scale. Contrary to CTT, where item statistics (e.g., percentage of items correct, item-test correlation, measures of reliability) are population-dependent (Lord, Novick, & Birnbaum, 1968), the IRT estimates of item characteristics are assumed to be invariant across populations, occasions, and independent of other items embedded in the test or questionnaire (Brennan, National Council on Measurement in Education, & American Council on Education, 2006). In particular, IRT assesses item discrimination, which refers to the extent to which the item is capable of distinguishing between individuals with different levels of the latent trait, and item location. Item location refers, in this context, to the level of the latent trait where the scale is most reliable and precise in distinguishing between individuals (Embretson & Reise, 2013).
Another feature that makes IRT stand out is that it recognizes that a scale will be more reliable and precise at distinguishing between individuals at a certain segment of the latent continuum, whereas CTT assumes a single, homogenous estimate of reliability (Brennan et al., 2006). For this reason, IRT provides information that can be used for the design of most efficient scales, allowing the selection of items that provide more information at the levels of the latent trait of interest (Jessen, Ho, Corrales, Yueh, & Shin, 2018). In doing so, it is possible to obtain a shorter, easier to implement scales, as well as a set of items that allow a more reliable and targeted measure.
In the case of the mental health scale for children used in the ENSM, it is unclear whether the 26 items selected by subject-matter experts represent a single underlying dimension, hypothesized to be mental health problems. Moreover, it is unclear whether a future implementation, using a shorter but high-informing scale, is possible. These issues are critical not only to reduce the time and resources used at measuring children's mental health problems in Colombia in its post-conflict situation, but also to do so with precision, which is key to inform prevention and attention efforts across the country. This study contributes to these objectives by answering the following research questions:
Does the mental health scale used in the ENSM measure a single factor (i.e., mental health problems) as intended?
What are the psychometric properties of the mental health scale, according to CTT and IRT frameworks?
Is it possible to implement a more efficient (i.e., with fewer items and high precision) scale for children's mental health problems on a future occasion?
Methods
Participants
The ENSM is a nationally representative sample for non-institutionalized children aged seven to 11 years, representing four regions (Atlantic, Western, Central, and Pacific), Bogotá, and each of the 32 national departments. The ENSM sampling comprises a probabilistic, multistage sampling procedure, and the sample size was designed following findings from previous national studies (Rodríguez et al., 2016). The sample used in the present study includes 2,727 children, having complete information for all the cases included in the ENSM. According to the ENSM, the children were, on average, nine years old, and a little more than half of them were girls. In the sample, 19 percent of children belonged to an ethnic minority, around 58 percent lived with their parents, and 86 percent with their mother. Additionally, 98 percent attended school, and 21 percent were considered poor according to a multidimensional poverty index (Alkire & Foster, 2011). Table 1 presents details. ENSM surveys were collected between January and May 2015.
Instruments
The ENSM included a 26-item instrument to assess children's mental health problems. This instrument includes 10 items from the RQC (Giel et al., 1981), and additional items from the CBCL (Achenbach, 1999), the CBTD (Caraveo and Anduaga, 2007), and "others based on the experience of the research groups [i.e., subject-matter experts that participated in the ENSM]" (Rodriguez et al., 2016, p. 15). The 26 items included in the scale, presented in Appendix 1, refer to yes (coded as 1) or no (coded as 0) questions, aimed at identifying diverse symptoms of mental health problems in children. Children's main caregivers filled out the questionnaire. In this case, 80 percent were their mothers, 7.2 percent their fathers, and the remaining were other caregivers. Even though previous studies show that the RQC, CBCL, and CBTD have good psychometric properties for assessing children's mental health problems (e.g., Ahmad et al., 2007; Castro et al., 2016), to date there is no validity evidence on the internal structure (i.e., coherence) of the scale employed in the Colombian ENSM.
Statistical Analysis
To begin with, Classical Test Theory (CTT) statistics were estimated (Crocker & Algina, 1986; Novick, 1966; Traub & Rowley, 1991). To analyze the characteristics of each item, the percentage of affirmative answers was used as a classical item location estimate and the item-test correlation as a classical information estimate. For the overall scale, Cronbach's alpha was estimated to examine the reliability (i.e., internal consistency) of the scale. Subsequently, factor analysis was used to fit a unidimensional model to the data and assess the dimensionality of the scale by analyzing the share of variance accounted for by the first factor (Merino-Soto, López-Fernández & Grimaldo-Muchotrigo, 2019; Thompson, 2004).
Furthermore, following the model presented in Equation 1, a two-parameter IRT model was fitted to the data (Embretson & Reise, 2013; Lord et al., 1968). In the model, θp represents the latent score (i.e., mental health problems-symptoms) of each childp, which is standardized (i.e., mean of zero and standard deviation of 1). Additionally, αi. represents the discrimination parameter, which indicates how well an item can distinguish between children with slightly different levels of the latent variable, and it is similar to factor loadings in a confirmatory factor analysis where items are continuous. Particularly, the discrimination parameter shows that a 1 unit increase in the latent variable 9 produces an a increase in the log of the odds of answering the item affirmatively. Finally, b i represents item location, which shows the level of the latent variable θ at which children have even odds of answering each item affirmatively. An advantage that IRT has over CTT is that the former estimates parameters that are invariant to populations of items and individuals, whereas the latter produces population-dependent parameters (Embretson & Reise, 2013; Traub & Rowley, 1991).
After fitting the IRT model to the data, the test information function is estimated as the sum of all item's information functions, which are calculated through their discrimination parameters (α), and the product of the likelihoods of having an affirmative (p) or negative (Q) answer in the item (Equation 2).
Using the test information function, the conditional errors of measurement (SE) were computed as presented in Equation 3. Contrary to CTT, where there is a single standard error of measurement for the scale, the (SE) in IRT shows the estimated error at different locations of the scale, making it possible to assess the level of imprecision in the measurement at different levels of θ.
Subsequently, items that provided less information were flagged for potential exclusion in a shortened version of the scale. Particularly, the information provided by the scale comprising all the original items (i.e., 26) was compared with a reduced scale with 21 items, 18 items, 14 items, and 11 items, removing items that provided less information in a step-wise fashion. To provide validity evidence based on correlation for the test scores calculated with different items, convergent validity evidence was analyzed, using information gathered using the Diagnostic Interview for Children (DISC-IV; Shaffer, Fisher, Lucas, Dulcan, & Schwab-Stone, 2000), which is an instrument that evaluated 32 common psychiatric diagnoses of children based on the Diagnostic and Statistical Manual - IV (DSM-IV; Bell, 1994). Convergent validity evidence was also examined through correlations between the total score and children's reported physical health (Aarons et al., 2008), exposure to bullying or discrimination (Cooke, Bowie, & Carrere, 2014), and exposure to selected adversities (e.g., Anda et al., 2006). The latter includes exposure to factors such as community crime, parental separation, major sickness, and other stressful events, which have been widely linked to mental health problems during childhood. All analyses were conducted in Stata 15.1 (StataCorp, 2017).
Results
Classical Test Theory and Dimensionality
The 26-item scale exhibits good reliability, with a Cronbach's alpha (α) of .74, suggesting that 74 percent of observed score variance is accounted for by true score variance, according to CTT. As shown in Table 1, according to CTT estimates, all the items have high location parameters (i.e., only a small proportion of children present the assessed symptoms, see also Figure A1 for histograms), whereas there is considerable variability in the amount of information that each item provides (according to item-test correlation), ranging from .20 (item 21, "Has the child needed to change school more than 3 times?") to .58 (item 12, "Have you noticed that the child has difficulty making friends of his or her same age?"). Nonetheless, these parameters are population dependent, so in a different administration with a different sample, they may vary.
The Kaiser-Meyer-Olkin of .81 shows that the sample is adequate for conducting factor analysis (Kaiser, 1974). A factor model was fit to determine whether a 1-factor solution could represent the data. As shown in Figure 1, the first factor explained 76 percent of the total variance, suggesting that the mental health scale used in the ENSM is capturing a single underlying dimension (i.e., mental health problems). As shown in Figure 2, a summary score following a single factor solution provides a skewed scale, as was expected, given the low prevalence of the different items assessed and given the purpose of the measurement (i.e., to identify mental health problems).
IRT and Scale Information
Table 3 summarizes item discrimination and location parameters for the 2pl-iRT model fitted to the data. Consistent with the findings from CTT, items have high location parameters, ranging from 1.31 (item 12, "Have you noticed that the child has difficulties making friends of his or her same age?") to 4.71 (item 25, "Is the child eating too little and losing weight?") standard deviations above the mean. Furthermore, there is a larger dispersion in the estimated discrimination, ranging from 0.3 (item 25) to 2.54 (item 24, "Do you think the child is overeating for his age?"). Nonetheless, in general, most items have as parameters above one, suggesting that they distinguish among children with different levels of mental health problems. Figure A2 presents Item Characteristic Curves (ICC), where higher location parameters shift the ICC to the left, and a steepest slope reflects a higher discrimination parameter.
Even though high discrimination parameters are warranted to distinguish among children with different levels of mental health problems, the information provided by each item depends on its location along the latent scale. In the case of the mental health scale, most items have high location values, indicating that the items can distinguish among (and provide information for) children with the presence of mental health problems (which is the purpose of this measurement), as shown in Figure 3. Figure 4 also reflects this fact, showing that the overall scale provides more information for higher levels of theta. Consequently, the scale is more precise and reliable at higher levels of theta, having a lower conditional standard error of measurement between 6=2-3 SD above the mean.
IRT'S main assumption is local independence, which indicates that theta (i.e., the level of the latent trait) provides all the information needed to know the probability of an affirmative response to an item. Given this assumption, it is possible to estimate the information provided by different scales by adding or subtracting the corresponding item information functions (Jessen et al., 2018). Figure 5 presents the test information functions for scales composed of all the 26 items, 21 items, 18 items, 14 items, and 11 items, subtracting items with lower levels of information in a stepwise fashion (see Table A2 for additional details). The reduction of items produces lower levels of information, but the reduction is not considerable. In general, even using the 11 most informing items would produce a reliable measure for high levels of theta (particularly around 2 to 3 SD above the mean).
Lastly, the correlational analysis reveals that the mental health scale has a statistically significant association with the expected sign to the number of psychiatric disorders identified using the DISC-IV, as well as with reported physical health, being discriminated, and not being exposed to adversities (Table 4). All these correlations keep their significance and have similar magnitudes when using reduced forms of the scale, suggesting that a shorter scale may be used if needed, given that it is reliable (Figure 5) and has correlational validity evidence (Table 4).
Discussion
In 2015, Colombia undertook its first nationally representative mental health survey (i.e., the ENSM) for children aged seven to 11 years. The ENSM included 26 items that were hypothesized to be measuring children's mental health problems, taken from the RQC (Giel et al., 1981), CBCL (Achenbach, 1999), CBTD (Caraveo and Anduaga, 2007), and others were based on the expertise ofthe group ofresearchers (Rodriguez et al., 2016). The items were based on measures whose score interpretation has validity evidence, offering content validity for the ENSM scale. Nonetheless, little was known about validity evidence based on coherence (i.e., internal structure). Moreover, it was not known whether a shorter scale would provide similar levels of information, thus being more efficient while having a high level of measurement precision.
The purpose of this study was to analyze the children's mental health scale using the CTT and IRT frameworks. The findings indicate that the scale has adequate internal consistency reliability, and the evidence from factor analysis suggests it is measuring a single latent construct. Furthermore, results from an IRT model show that most items have a high location as can be expected, reflecting the fact that only individuals with a high 6 (i.e., exhibiting mental health problems) will have even or higher odds of answering each item affirmatively. Given the local independence assumption, IRT also reveals that different items provide substantial different levels of information to the total scale information, suggesting that a more efficient scale, employing only high-informative items, would be feasible. Indeed, findings from the item information function and convergent validity indicate that shorter scales will keep desirable psychometric properties and could be employed in future implementations of the mental health scale when increased efficiency is needed.
One major strength of this study and contribution to the Colombian literature is the use of an IRT framework, which conversely to CTT theoretically offers population-invariant parameters that can accurately inform future implementations of each item. Indeed, the CTT framework produces parameters that depend on the specific population where the items are implemented, and which are, to a certain extent, predictable. For instance, following the Spearman-Brown prophecy formula, it is possible to infer that Cronbach's alpha will be higher as one test has more items and as the population where the test is implemented is more heterogeneous, whereas fewer items and a more homogenous population would lead to lower test reliabilities (Traub & Rowley, 1991). On the other hand, IRT estimates item location, discrimination, and information that are assumed to hold in different occasions and populations (Embretson & Reise, 2013). These population-invariant parameters can inform the design of scales, maximizing precision at the desired 6 level and permitting the selection of high-informing items when efficiency is paramount (Jessen et al., 2018).
Even though this study makes relevant contributions offering validity evidence based on coherence and correlation for the ENSM children's mental health scale, it does not provide validity evidence based on response process (i.e., cognition) or consequences (Koretz, 2008). A future pilot study could be implemented to analyze the type of cognitive process respondents employ to respond to the scale's items, assessing whether some items may be particularly cognitive-demanding. Moreover, it could be useful to conduct studies that make it possible to elucidate whether the test produces certain consequences on respondents, such as changes in their behaviors or interactions with their children following the test. It is also important to consider that the ENSM scale faces limitations that future studies should explore further. For example, the scale is based on parental reports and these reports may be biased due to parents' mental health problems. Future efforts must be conducted to offer more evidence on the predictive validity of the scale on children's mental health problems in Colombia according to clinical assessments.
Conclusion
The 26-item children's mental health scale used in the ENSM has adequate psychometric properties, and evidence from factor analysis suggests it is measuring a single latent construct. A 2pl-iRT model reveals that the scale is accurate at distinguishing between children with high levels of 6, around one and three SD above the mean. In a future implementation of the scale, when lowering the number of items and higher efficiency are needed, a 21, 18, 14, and even 11- item scale may hold desirable properties and predictive power. These findings suggest that future efforts can be conducted to continue monitoring children's mental health in Colombia, especially in the post-conflict situation, when it is necessary to identify children who would need additional supports.