Introduction
Functional capacity (FC) is defined by Cabañero-Martínez et al., as a reflection of the integrity of the individual during aging, which is a product of the interaction of biological, psychological and social elements [1]. Podestá & Risso [2] refer to functional capacity as the skill of a person to supply the needs that generate wellness and conceptualizes them in three fields: biological, psychological and social. Due to the increase in life expectancy and cultural changes in relation to the process of aging, the preoccupation about functional capacity of senior citizens has obtained greater relevance [3], principally faced with possible inequalities for senior citizens accentuated by their functional limitations [4]. Therefore, nowadays we opt for geriatric assessment scales to evaluate the functional status of the elder, to establish an impact of diseases on them and to do follow-up, and in this way to dimension their capacity to perform activities of daily living independently.
Various alternative scales are available to assess the functional capacity of senior citizens. Katz’s functional assessment scale in geriatrics [5] has been one of the most studied in its psychometric properties, and one of the most widely used in related research [6]. It arises as the Katz Index of Independence in Activities of Daily Living with the purpose of evaluating the dependency of senior citizens with hipbone fractures and is later renamed as the Katz Index [7]. The scale is theoretically supported focused on the loss and recovery of functions [7] and assesses the dependence/independence degree from six daily life activities [8]. The Katz index has been considered particularly useful for the institutionalized elderly population or those with health problems [9].
In the year 2000, Cohen & Marino [10] did a review on the functional measurement for research in disability, which included the Katz index and the identification of deficiencies in terms of application and scoring standards and poor psychometric performance. In 2009, Cabareño-Martinez et al. [11] did a review about studies using the Spanish version of the Katz index and identified similar limitations. These authors concluded that this index has acceptable predictive performance, but important limitations for its interpretation. More recent studies about the psychometric properties of the scale have shown better performance in terms of reliability and validity [12], but only one study has evaluated its interpretability [13].
Improving Katz index interpretability would contribute to a better understanding of its results when applied as a part of a health situation analysis, facilitating decision making and intervention planning. Improved interpretability implies reviewing its psychometric properties systematically considering recent recommendations to validate health [14,15] and psychology [16] measures, which allows for examination of the theoretical rationality and empirical arguments that support its intended uses and interpretations. Thus, this study aims to evaluate the validity of the Katz geriatric functionality scale in senior citizens from a Rasch model perspective.
Methodology
Study design
Secondary data analysis of a population-based, cross-sectional study of health status in senior citizens conducted in Antioquia, Colombia, in 2012. A detailed description of the study design can be found elsewhere [17].
Participants
We included 4,023 people 60+ years-old residents in the urban zones and populated centers in the department of Antioquia. These populated centers belong to the 10 subregions in which the department is divided, which according to the population projections of the Departamento Administrativo Nacional de Estadística (DANE), had 671,590 senior citizens 60+ years-old for the year 2012. We excluded senior citizens living in retirement homes, and those that scored fewer than 24 points in the Mini-Mental State Examination (MMSE) at the beginning of the survey.
Data collection
Field work was done by a group of surveyors, who were standardized in survey application, sampling, and information quality. The sampling was probabilistic by clusters, in two phases, and representative by subregion. The sample was calculated with Fleiss’ formula for finite populations, with a 95% confidence level, 5% sample error, 50% good health state proportion, and a design effect (deff) of 1.0, finally constituted by 4,248, with a 10% addition for possible losses.
Rasch analysis
The dichotomous Rasch model [18] was used in order to analyze the Katz index. This model states that a positive response from a person n to an item I (Xni = 1) is a probabilistic function of the person’s ability level (θn) and the difficulty of the item (δi), so the odds of a positive response from a person of certain ability level to an item of a given difficulty equals Rasch based analysis focuses on establishing if data obtained from an instrument provides an invariant, one-dimensional and interval scale representation of a latent attribute of interest. In this sense, the items that make up the instrument must comply with a series of fundamental requirements for measurement which are evaluated by adjustment of the data to the Rasch model [19].
We considered the recommendations for assessing health status related attributes given by the Medical Outcome Trust (14). The analyses were conducted in Winsteps, 3.92.1 [20], and Stata 12 [21].
Item fit and Differential Item Functioning (DIF)
Item fit to the Rasch model was assessed with Mean Square statistics (MNSQ). Both Infit and Outfit mean squares are reported. Appropriate fit was considered for values between 0.5 and 2.0 [22]. Item measures (β) and their standard errors were also reported. Local dependency was analyzed with standardized residual correlations; values >0.7 were considered evidence of local dependency, as they would indicate more than 50% of shared variance.
Invariance of item measurement between gender, illness report (yes/no), age groups (<70 / ≥70), ethnicity (mestizo/other) and residential area type (urban/rural) were assessed with graphical and mathematical techniques. First, DIF contrast was calculated for each item and p-values were obtained by Welch and Mantel-Haenzel methods. Benjamini-Hoschberg critical values [23] were calculated in order to control multiple comparison false positives. DIF was considered significant if the p value was lower than 0.05 and lower than critical value. Second, 95% confidence Bland-Altman [24] agreement plots were constructed to decide about DIF impact. Only Items with DIF contrast greater than 0.5 logits and outside the agreement curves were considered of relevant DIF.
The impact of DIF items on the functional measurement was analyzed with the Bland-Altman agreement plot by comparing person estimates (and standard errors) between the original scale and the unbiased scale (without DIF items).
Dimensionality
Unidimensionality was assessed with two procedures. First, the proportion of raw variance explained by the Rasch-measure was calculated; a proportion of at least 40% was considered acceptable evidence for test unidimensionality [25]. Second, Principal Component Analyses were performed on residuals, and results were expressed in terms of eigenvalues and proportion of total residual variance explained by the contrasts. Contrasts with eigenvalues greater than 2 or variance explained over 10% were considered evidence of a possible second measurement [26]. For contrasts with these characteristics three item clusters were constructed and disattenuated correlation was calculated. Disattenuated correlations greater than 0.5 were considered as irrelevant [18] and evidence of validity of an unidimensional approach to the Katz index.
Reliability and item-person map
Item reliability, item separation and internal consistency were calculated. The item-person map is presented, and item hierarchy and targeting are analyzed [27].
Results
Overall, 4,023 senior citizens were measured with the Katz scale, and sociodemographic characteristics of the sample are presented in Table 1.
n | % | |||
---|---|---|---|---|
Gender | Male | 1545 | 38.4 | |
Female | 2478 | 61.6 | ||
Residential area type | Urban | 3697 | 91.9 | |
Rural | 326 | 8.1 | ||
Ethnicity | Mestizo | 3449 | 85.7 | |
Other | 574 | 14.3 | ||
Sick in the last four months? | Yes | 2779 | 69.1 | |
No | 1244 | 30.9 | ||
Age in years | Q2 (Q1-Q3) | 70 (64 - 77) |
a Q2: Median, Q1: First quartile, Q3: Third quartile
Item fit and Differential Item Functioning
Table 2 presents the measurements and standard errors for each item in logits. All items showed acceptable Infit MNSQ between 0.57 and 1.12. Items on1, 3 and 6 showed overfit according to Outfit MNSQ. There was no evidence of local dependency with standardized residual correlations below 0.09. Differential Item Functioning was detected in the items transferring and continence. For transferring difficulty was higher for rural, non-ill and under 70 older adults and for continence difficulty was higher for urban, ill and for over 70 older adults.
Item | |||||||
---|---|---|---|---|---|---|---|
Bathing | Dressing | Toileting | Transferring | Continence | Eating | ||
Item statistics | Measure | 0.02 | -0.05 | 1.45 | -3.82 | -4.78 | 7.18 |
s.e. | 0.26 | 0.26 | 0.43 | 0.09 | 0.10 | 1.03 | |
Infit MNSQ | 1.04 | 1.06 | 0.57 | 0.94 | 1.12 | 0.99 | |
Outfit MNSQ | 0.29 | 0.46 | 0.04 | 0.87 | 1.41 | 0.03 | |
Differential Item Functioning (DIF contrast) | Residential area typea | 0.00 | -0.06 | -0.98 | -1.34 | 1.55 | 0.00 |
Illnessb | -0.61 | -0.68 | 0.17 | -0.67 | 0.79 | 0.00 | |
Ethnicityc | 0.30 | 0.23 | -0.03 | -0.29 | 0.26 | 0.00 | |
Sexd | 0.21 | -0.53 | -0.07 | 0.29 | -0.26 | 0.33 | |
Agee | 1.30 | -0.82 | -0.48 | 0.99 | -1.00 | 0.15 |
as.e.: Standard Error; MNSQ: Mean squared; Bold: Significant according to Benjamini-Hochberg and Bland-Altman; Underlined: >0.5 logits. a: Urban vs. Rural; b Yes vs. No; c Mestizo vs. Other; d men vs. women; e <70 vs. ≥70
Figure 1 shows the difference in person estimates between the original scale with six items and the unbiased form (without Transferring and Continence). Both measures differ among low scores where the six-item score overestimates functionality compared to the four-item score. Even when differences show no statistical significance, mean difference (red line) is higher than 0.5 logits. Some scores around the person mean (zero) are significantly underestimated (below the green agreement line). High scores show less deviation in the six-item scale.
Unidimensionality
The Rasch-measure explained 51.6% of the raw variance with 6.4 eigenvalues out of 12.4. Two contrasts explained more than 10% of the residual variance with 1.9 and 1.6 eigenvalues, respectively. All the disattenuated correlations between the clusters identified within contrasts were greater than 0.8, which is evidence enough to discard them. However, when analyzing the first contrast, we identified that two clusters were separated: 1) items Toileting, Continence, and Eating vs. 2) items Bathing, Dressing, and Transferring.
Reliability and Item-Person map
Item reliability was 0.98 with a separation of 7.8, and raw scores internal consistency was 0.78. Figure 2 shows the Item-Person map. Person measures range from -5 to 8 with a mean of -5.59 (SD=1.72). 3,365 people (85.6%) scored the lowest possible value which indicates a serious floor effect. In contrast, only 42 (1.1%) people scored the highest possible value. Item hierarchy shows that limitation to eat (Eating) implies the highest dependency measure, and limitations with Transferring and Continence implies the lowest dependency measure in the senior citizen according to Katz Index.
Discussion
The items that make up the Katz index showed appropriate performance in terms of adjustment and unidimensionality. However, we identified considerable differential functioning of the items, mainly Transferring and Continence. When comparing the estimation of the attribute between the original scale and the one without the two items with DIF, we observe an overestimation from the original scale that is not statistically significant.
Various studies about the psychometric properties of the Katz index have been reported in the literature, with most of them done from the Classical Test Theory, which makes it difficult to compare these papers with this study done from the Rasch model. However, it is possible to compare the reliability, given the fact that the reliability of the items estimated by Rasch is the same as that estimated by CTT methods [22]. In terms of the reliability of the scale, our results are similar to those reported in the literature, with values between 0.84 and 0.94 in Turkish [12], Irani [28], Dutch and Moroccan [29], and Lebanese [30] elderly people. Similarly to the Colombian case, it was not necessary to delete items due to difficulties in performance in none of these studies.
In the literature, there is a study of Katz done from the Rasch model by Gerrard [13], taking information from a secondary source in a retirement home national survey in 2004 in the United States. The author analyzed information about 13,113 institutionalized elderly people and identified appropriate adjustment of the items according to the infit and outfit statistics and a consistency of 0.79. The author does not report unidimensionality analysis, local dependency of the items or differential functioning. Gerrard’s article focuses on the hierarchy of the items, identifying Bathing (Measure=3.90) and Dressing (Measure=1.09) as the most difficult, and Eating (Measure=-2.63) and Continence (Measure=-1.58) as the easiest. These results differ from those obtained in Colombia, where the most difficult items were Eating and Toileting, and the easiest were Continence and Transferring. These differences could occur due to the fact that the American elderly people were institutionalized, and the Colombian ones were not. Nevertheless, the hierarchy is inverted for some items, which suggests low generalizability of the estimators and potential difficulties for international comparisons. It is also important to consider that these differences occur due to the differential functioning of the items. However, this hypothesis cannot be contrasted given the fact that in the North American study these analyses were not done.
The main limitation of this study is due to the way in which we evaluated health status and ethnicity. In both cases, we used the self-report of senior citizens, which may have an impact in the differential functioning analyses of the items for possible bias in the classification. Additionally, we obtained a strong floor effect and close to 87% of the participants had the lowest possible scoring. However, it is important to recognize that the Katz index was designed for clinical and rehabilitation purposes and in population studies it should be used for screening of moderate to severe functional impairment [1, 5]; the interpretation of full scores as functional independence is discouraged. Floor effect could also explain the over adjustment identified in the Bathing, Eating, Dressing and Toileting Outfit statistics, which are far from the concentration center of the answers. Nonetheless, given the independence of the estimation of item and person statistics that characterizes the Rasch model [25], this affects the precision of the difficulties but not necessarily its validity.
In this study, we could highlight some of its strengths. It is the only study of the Katz index that does a complete evaluation of validity arguments, including local dependence of the items, unidimensionality and differential functioning. In addition, these analyses were done following the methodological guideline of the Medical Outcome Trust, which gives the results substantial validity, in an epistemological sense, a fact that some operatively oriented psychometric studies lack from. Lastly, the sample was obtained by means of probabilistic techniques and trained surveyors, with high quality control of the data and potential information bias.
Conclusions
Evidences of validity identified in this study allowed us to demonstrate that it is possible to obtain a valid, unidimensional and interval level measure from the Katz instrument. Nevertheless, it is important to do more research on differential functioning of Katz’s items in a way in which validity arguments on its generalization to various subpopulations could be strengthened and score interpretability could be improved. These results are of the utmost importance for population research and rehabilitation practice with senior citizens, because they allow having valid arguments to make decisions about their functional status using the Katz index.