Introduction
Freudenberger and Maslach formally began the study of burnout syndrome during the 1970s. Considered a chronic response to work-related stress, burnout is explained as a progressive symptomatology of emotional exhaustion, depersonalization, and low personal accomplishment 1-7. Highly demanding jobs together with decreasing resources enable the onset of burnout, and its consequences impact workers' health and organizations' growth 8,9. In addition, the effects resulting from burnout have proven to cause issues in workers' physiological (e.g., heart disease) and psychological (e.g., insomnia) conditions, in addition to affecting their performance at work (e.g., increased absenteeism) 10.
Burnout is particularly sensitive in jobs where health care activities are inherent to the position. As an analogy to the typical teacher-student relationship, medical work also requires constant interaction with patients 7,11. This iterative process gives rise to overloading work demands, and work-family conflicts, job insecurity, and interpersonal problems emerge, which are considered sources of burnout 12-14. Demands at work may vary based on the socio-cultural and working scenario for physicians; however, in the particular case of burnout, there is agreement on its high prevalence across different regions of the world 15. Therefore, detecting this psychosocial risk is especially important in medical contexts, particularly where research reports on this topic are still considered to be on the rise (e.g., Peru). This emerging situation is associated with insufficient methodological accuracy of publications addressing this subject as well as low number of studies on the metric evidence concerning the validity of the instruments used 16. This study specifically assessed the Maslach Burnout Inventory (MBI).
The MBI is a psychological measurement instrument used to assess burnout. In its third edition, three versions were created targeting a) health professionals and social service workers (MBi-Human Service Survey, MBI-HSS), b) teachers (MBi-Educators Survey, MBI-ED), and c) general workers (MBi-General Survey, MBI-GS) 11. MBI-HSS was the first version of the instrument developed from clinical experiences and comprised 22 items referring to the constant worker-client (patient) interaction explaining the burnout symptomatology through three dimensions: Emotional Exhaustion (EE, the feeling of being exhausted and overwhelmed physically and psychologically; it represents the burnout central dimension), Depersonalization (DE, a cynical and detached attitude towards work and colleagues), and Low Personal Accomplishment (PA, tendency to negatively assess personal achievements at work; this dimension is positively assessed) 17.
MBI-HSS is one of the most widely used instruments globally 18,19. In the case of Peru, is one ofthe countries that continuously reports its application in investigating burnout. MBI-HSS, however, poses questions that affect its usefulness, associated with its translations and the divergence between theory and practice 20. This has occurred mainly in Spanish-speaking contexts, thus resulting in a deviation from the instrument's conception and understanding the construct 21.
The original version of MBI-HSS recommended the exclusion of items 12 (PA) and 16 (EE) for internal structure verification purposes, as these presented significant factor loadings in EE and DE dimensions, respectively 11. These recommendations obtained greater empirical evidence after the systematic and meta-analytical review conducted among female nurses 22. Several studies also reported inconsistency in the MBI-HSS internal structure across different work activities (e.g., nurses and other jobs), concluding the need to eliminate some items or modify their dimensions 23-29. These results and issues were also present in medical samples 30-32.
For several years now, the Peruvian government has recommended the study of burnout and its causes, in response to scarce research 33. In this endeavor, in 2014, the Instituto Nacional de Estadística e Informática (National Institute of Statistics and Information Technology, INEI) and the National Health Superintendency (Susalud) participated in the burnout assessment (through the use of MBI-HSS) as part of the User Satisfaction Survey (Ensusalud), a physician survey with a free database (http://portal.susalud.gob.pe/blog/ base-de-datos-2014/). Including MBI-HSS can serve as a proxy for information related to the psychosocial discomfort of Peruvian physicians, mainly due to the assessment's national scope. However, the MBI-HSS metrics featured in this study have not been verified, and given that this instrument's measurement issues are globally known, it is not only important but also necessary to assess whether the representation of burnout by the MBI-HSS items remains consistent with the theory.
In this regard, a search for validation studies of MBI-HSS was conducted in databases such as Scopus, PubMed, and Scientific Electronic Library Online (SCÍELO) in Peru, using keywords in Spanish and English according to the following categories: a) construct: burnout syndrome; b) study type: validación validation; c) sample: médicos, doctor (physicians; doctor), and d) nationality: peruano (Peruvian), which resulted in the total absence of validation studies among the Peruvian population. A complementary free search was conducted using Google, identifying one study that obtained evidence of content validity from the assessment performed by three judges who reached a certain degree of agreement 34. However, no further details were provided regarding the judges' expertise or the kind of analysis used to assess the agreement and its results, casting doubts on this validation process and probably on the conclusions derived from the study's main objective. Such efforts can be considered emergent and preliminary in a context of scientific accuracy.
Despite the lack of instrumental publications and inadequate validity procedures for burnout in Peruvian physicians, empirical reports present results on the prevalence of burnout and its association with social and demographic variables by using MBI-HSS 35-37. Barring the report by Solis et al. that examined the content validity of MBI-HSS, the remaining studies justified the use of this instrument with the validation process performed in contexts other than Peru, a premise adopted assuming cultural similarity 38. This practice, known as validity and reliability induction, is not recommended, as it involves transferring the validity or reliability evidence from one construct developed in one context to a new one, without enough rationality or the verified application sample and the suitability of it assessment 39. Taking the recommendations from the American Education Research Association, the American Psychological Association, and the National Council of Educational Measurement into account, the priority of assessing the internal structure of measurement instruments in contexts not previously examined is an essential requirement and a condition for other validation sources subsequent to the internal structure definition 40.
As the characterization of burnout is critical to clinical intervention, and with the above considerations in mind, this manuscript examines the factorial structure of MBI-HSS in Peruvian physicians using secondary data analysis. Moreover, the invariance of the measurement will be tested based on the item response theory through the analysis of differential item functioning (DIF) based on gender.
Materials and Methods
Research Design
This is an instrumental research design, which consists of applying one or more methodological strategies to examine the metric properties of a measurement 41,42.
Sample
Peruvian physicians were selected from a two-stage stratified probability sampling (http:// portal.susalud.gob.pe/wp-content/uploads/archivo/encuesta-sat-nac/2014/FiCHA-TECNICA.pdf). A total of 2228 physicians participated in this study, of which 6 were withdrawn as their MBI-HSS scores were not recorded in the survey database, leaving a total of 2222 participants for this study's analysis.
According to the sociodemographic characteristics, a greater proportion of male physicians (76.2%) between the ages of 24 and 83 years (M = 45.41; SD = 11.061) were found to work at the Ministry of Health-Regional Government institutions (MINSA- GR; 45.1%), ESSALUD (47%), private clinics (6.3%), the Peruvian Armed Forces (FFAA), and the Peruvian National Police (PNP; 1.6%). Their work experience ranged from 1 to 53 years (M = 16.50; SD = 10.00), having an indefinite, appointed, and permanent contract type (45.7%), undetermined according to Legislative Decree No. 728 (24.6%), special regime of administrative service contracting (CAS) (13.1%), fixed-term contract (6.8%), professional fees (4.7%), and other (5%). It is also known that 43.7% of physicians hold other healthcare-related positions and 66% are specialized in their field of work.
The city of residence was distributed as follows: Lima (26%), Arequipa (8.6%), La Libertad (6.8%), Lambayeque (6.5%), Ica (5%), Ancash (4.7%), Cusco, (4.1%), Tacna (3.6%), Junin (3.4%), Piura (3.2%), Huánuco (2.9%), San Martín (2.7%), Cajamarca (2.7%), Apurimac (2.7%), Puno (2.6%), Ayacucho (2.3%), Loreto (2.2%), Tumbes (1.9%), Ucayali (1.6%), Moquegua (1.6%), Amazonas (1.5%), Callao (1.3%), Madre de Dios (1.1%), Huancavelica (0.7%), and Cerro de Pasco (0.4%). The physicians' marital status was classified as married (63.7%), single (23%), cohabitating (6.7%), divorced or separated (5.9%), and widowed (0.7%). The rest ofthe detailed information can be found in the aforementioned database.
Instrument
Maslach Burnout Inventory Human Service Survey (MBI-HSS, 11)
This instrument is composted for 22 items integrated into 3 dimensions: Emotional Exhaustion (EE, 9 items; "me siento emocionalmente agotado por mi trabajo"), Depersonalization (DE, five items; "Creo que trato a algunos de mis pacientes como si fueran objetos impersonales") and Personal Accomplishment (PA, 8 items; "Trato muy eficazmente los problemas de los pacientes"). Each item can be answered using a 7 points Likert scale ranging from 0 (never) to 6 (every day). It could not identify the origin of the translation or adaptation used of the MBI-HSS for this national survey.
Analysis
Descriptive Statistics. Descriptive statistics of the Mean (M), Standard Deviation (SD), Asymmetry (g1), and Kurtosis (g2) were examined.
Internal Structure. To examine the evidence of validity of the internal structure, the study sample was initially randomly divided into two groups.
In the first group (n = 1089), the Exploratory Factor Analysis (EFA) was carried out through the Factor program, version 10.7.01 43. To fulfill the objective, the covariance matrix was specified according to the ordinal range of response, and the extraction of unweighted least squares was used with the prominent rotation method. Horn's parallel analysis was applied for factor identification 44. The adequacy ofthe correlation matrix was calculated using the Kaiser-Mayer-Olkin (KMO) and Bartlett's sphericity test. Based on this result, the researchers evaluated the factorial simplicity to verify the amount of variance of the factor loadings in relation to their dimension and compared with the other dimensions, verified at the item (ISF-F) and factor (ISF-F) levels. This analysis was conducted using the SIMLQAD program 45.
In the second sample (n = 1133), the resulting EFA model was checked through a confir-matory factor analysis (CFA), implemented by means of the EQS program, version 6.2 46. The poly-correlation matrices were used for factorial extraction. In view of the rejection of multivariate normality through the Mardia test (222.7499), the maximum likelihood estimation method was implemented with the Satorra-Bentler (SB-X 2 ) correction. The selected adjustment indexes in the study were RMSEA (< 0,05; 90% CI), CFI (> 0,95) and SRMR (< 0,05) 47. The acceptance criterion for factor loadings was expected to be >0.40 48.
In addition, the possible incorrect specifications of the analyzed models were assessed by checking the Sórbom Modification Indices (MI) 49. This procedure included evaluating their magnitude by using the SSV-SPSS program, considering the following as criteria: minimum factorial load of 0.40, statistical power at 0.75, and minimum parameter change (EPCMIN) at 20 50,51.
Differential Item Functioning (DIF). The monotonic association coefficient, partial gamma (yp), was implemented 52. The use of this coefficient is suggested to identify the conditional relationship between items and grouping variables (e.g., gender), controlling the score of the examined scale 53-55. The levels for results qualification 54,55 were weak (|0.00| to |0.150|), moderate (|0.16 | to |0.30|), and strong (>|0.31|).
Reliability. This was evaluated through internal consistency by means of the a coefficient and the w coefficient 56,57. To consider an adequate coefficient of internal consistency, values greater than or equal to 0.70 were expected.
Results
Internal Structure
The KMO result was satisfactory (0.90) and Bartlett's sphericity test was significant (p < 0.001), being suitable to perform the EFA.
Exploratory Factor Analysis. The EFA (table 1) showed the replication of a three-dimensional structure of MBI-HSS; three items were removed for further analysis because one of them was a Heywood case (Burnout 1), whereas the other two cases (Burnout 16; Burnout 21) presented factor loadings below the expected criterion. The EFA was again replicated finding that the factor loadings were greater than 0.40. This result was corroborated with that obtained with the ISF-I, which showed that the items were influenced by their latent dimensions. This result was corroborated at the factor level (ISF-F).
M = Mean; SD = Standard Deviation; As = Asymmetry; Ku: Kurtosis; EE = Emotional Exhaustion; DE = Depersonalization; PA = Personal Accomplishment; ISF-I = Item Level Factorial Simplicity Index; ISF-F = Factor Level Factorial Simplicity Index.
Confirmatory Factor Analysis. Concerning CFA (table 2), adjustment rates were partially adequate with the 19-item structure obtained from the exploratory analysis, SB-X2 (149) = 720.016 (p < 0.001), CFI = 0.977; RMSEA = 0.058 (CI = 0.054, 0.062), and SRMR = 0.063. Factor loadings were moderate and high (between 0.53 and 0.82); the correlation between dimensions was theoretically consistent; however, EE and DE showed high covariation. Regarding the fit improvement specifications (modification indexes, MI), the MIS reported 79 pairs of correlated errors between items, which affected the change in SB-X2. When evaluating the practical significance of these MI, 19 (11.17%) of these pairs were relevant, but only 16 of them were selected because they were present within the same factor. The result was an improvement in the fit indexes of the SB-X2 (116) = 396.734 (p < 0.001), CFI = 0.987, RMSEA = 0.046 (CI = 0.041, 0.051), and SRMR = 0.055. The application of these new parameters indicated the existence of non-factor systematic variance.
EE (λ) | DE (λ) | PA (λ) | h2 | Differential item functioning | |||||
---|---|---|---|---|---|---|---|---|---|
Y p | ci 95% | Homogeneity-x2 (gl = 9) | |||||||
MBI 2 | .759 | .576 | -.116 | -.234, .002 | 15.67 | ||||
MBI 3 | .786 | .618 | .141 | .027, .255 | 6.23 | ||||
MBI 6 | .724 | .525 | -.022 | -.138, .093 | 4.58 | ||||
MBI 8 | .801 | .642 | .136 | .027, .245 | 13.32 | ||||
MBI 13 | .666 | .443 | .067 | -.052, .186 | 1.56 | ||||
MBI 14 | .612 | .374 | -.000 | -.103, .103 | 13.52 | ||||
MBI 20 | .621 | .386 | -.062 | -.176, .052 | 11.15 | ||||
MBI 5 | .751 | .564 | -.093 | -.225, .040 | 13.80 | ||||
MBI 10 | .795 | .632 | .315** | .202, .427 | 9.76 | ||||
MBI 11 | .815 | .664 | .053 | -.070, .176 | 6.92 | ||||
MBI 15 | .701 | .491 | -.200 | -.370, -.029 | 1.22 | ||||
MBI 22 | .531 | .282 | -.106 | -.226, .014 | 12.53 | ||||
MBI 4 | .618 | .382 | .006 | -.123, .135 | 13.18 | ||||
MBI 7 | .708 | .501 | -.104 | -.236, .028 | 3.40 | ||||
MBI 9 | .687 | .472 | -.032 | -.162, .067 | 1.05 | ||||
MBI 12 | .687 | .473 | -.141 | -.259, -.022 | 10.09 | ||||
MBI 17 | .778 | .605 | .204 | .076, .329 | 12.18 | ||||
MBI 18 | .819 | .671 | .064 | -.063, .191 | 12.37 | ||||
MBI 19 | .770 | .593 | .016 | -.104, .137 | 7.63 | ||||
Correlation | |||||||||
EE | 1 | ||||||||
DE | .811 | 1 | |||||||
PA | -.491 | -.581 | 1 | ||||||
Reliability | |||||||||
a | .834 | .738 | .794 | ||||||
w | .918 | .845 | .909 | ||||||
W corrected | .447 | .335 | .517 |
λ = Factor Loadings; h2 = Communality; α = Cronbach's alpha; ω = Omega. yp: partial gamma. **p < 0.01.
Differential Item Functioning (DIF). Table 2 shows the results of the DIF evaluation. Barring item 10, the y p coefficients in the rest were not statistically significant (p > 0.40), and were predominantly small in magnitude, indicated by point estimates and their confidence interval estimates. The homogeneity of the y p estimated at each score level (see Homogeneity-x 2 section) indicated that direction and size tend to be similar.
Reliability. Using a calculation of a and w, the reliability was acceptable (>0.70), but the presence of correlated errors required adjusting these estimates using the Raykov method 58. When modeling the pairs of correlated errors (16 pairs), the reliability estimates (ωcorrected) decreased considerably in all three dimensions (table 2). Discrepancies were also observed between a and a>, in which a was always lower than uncorrected a>.
Discussion
his study examined the validity of the internal structure of MBI-HSS in Peruvian physicians using secondary data collected by the 2014 Ensusalud survey. The results indicated the presence of a three-dimensional structure equal to the original proposal, although three items were eliminated due to the introduction of a Heywood case (item 1, EE) and low factor loadings (item 16, EE; item 21, PA) 11,17. When considering the resulting factorial model of 19 items, the correlation between EE and DE was stronger, while the correlation between EE and PA was lower but around 0.50. In a more general context for our result interpretation, the decision to maintain or eliminate items affected the size of the correlations between the factors and the factor loadings. The item removals carried out in this study as well as in the previous studies suggest that the correlations between the constructs of MBI-HSS obtained in our study (and from other studies) require careful interpretation and may be of limited in terms of generalization, caused by content variations in the instrument's structure 23,25,59,60. An additional implication of this inconsistent reduction in MBI-HSS is that the interpretation of the scores is between studies that removed different items.
Along with the problem of MBI-HSS content be altered by eliminating items, the inclusion of correlated errors indicates the inadequacy of the factor in retaining the variance that explains the responses to the items. The existence of correlated errors in this study may be a consequence of the application context, the reactions of those examined, the item content, or the interaction between these components. It was impossible to detect which of these had the greatest effect on the present sample, but it is clear that the measurement quality of MBI-HSS is compromised.
Another methodological consequence of adding correlated errors was calculating reliability. Indeed, reliability for internal consistency (uncorrected for correlated errors) reported acceptable levels, but after modeling the pairs of correlated errors, the coefficient w of each factor considerably decreased. This effect is not unknown in the methodological literature, and it is highly advisable to include the correlated errors to obtain greater accuracy when calculating reliability 58. Therefore, these adjusted estimates are theoretically more appropriate and indicate that variance attributed to items to their latent factor is less than 0.70 (the criterion chosen), even less than 0.60. One implication of this result is that error variance around the score leads to an inaccurate interpretation of scores as well as low replicability, thus making it difficult to guarantee the reproducibility of an examinee's score. Furthermore, the greater magnitude of the uncorrected w coefficient (which represents a model wherein items vary in their degree of validity with respect to their factor) was also observed compared with coefficient a for each factor and suggests that a coefficient other than a may be needed to estimate the measurement error in the version validated herein.
Given the representativeness of the sample and the metric results obtained, MBI-HSS in its current state (original version or modified by this study) does not appear to be an option for calculating burnout in Peruvian physicians. This result is similar to other validation studies in nurses regarding item reduction 30,31. Considering the recommendations of Maslach, Jackson and Leiter 11 on removing items 12 and 16, it appears that this removal should be extended to more than two items. In this sense, it is necessarily relevant to verify the factor structure of MBI-HSS in each medical group, and by implication, in each Peruvian work group in which the instrument is used. This type of evaluation reduces the risk of inducing validity when it is not justified 39.
It was not apparent that differential functioning of items (DIF) existed in the adapted version, which validates the comparisons that can be made between men and women in MBI-HSS. Although only one item was detected (item 10 of Depersonalization), the reason for this differentiated response is unclear and requires a mixed methods study involving preliminary quantitative analysis and post hoc qualitative analysis. The homogeneity assessment of the y p coefficients suggests that the variability due to DIF is uniform and could suggest the absence of non-uniform DIF. However, the detection of DIF in this item entails only one step to determine if the bias in its interpretation is present. Overall, and given the magnitude of the gamma coefficients, it can be concluded that the version obtained in this study does not produce apparent significant differential performance.
In addition, the findings of this study open an overview of the adequate use required with MBI-HSS, especially when this measurement instrument presents the most reports for the burnout assessment. For Peruvian healthcare professionals in particular, this first report on internal structure indicates that the use of the original 22-item version can be disregarded and as a result, previous studies with MBI-HSS could potentially be disregarded as well. This situation may lead to greater compromise if these studies have entailed making decisions that affect the participants involved, thus leading to ethical implications about the inappropriate use of instruments involved in employment or clinical decision-making.
Questioning whether burnout continues to be one of the most frequent problems in the medical profession is less likely. The impact of its prevalence in Peru cannot yet be detailed, but it can be deduced from international studies that this psychosocial risk deserves to be addressed because of its levels, what causes it, and the consequences that it may entail for physicians and their organization.
The limitations of this study include lack of disaggregation according to the specific occupation of the medical staff; presentation of other factorial models for understanding burnout (e.g., bi-factor model); and lack of verification of other evidence of validity such as exploring its relationship with other constructs by verifying its relationship with other tests with a similar purpose or by evaluating the degree of prediction it provides in constructs such as depression, anxiety, somatic symptoms and sleep problems 61,62. Furthermore, the type of DIF analyzed was uniform, and a formal examination of the non-uniform DIF is required. Finally, no analysis was applied directly calculating the number of factor cross loadings (e.g., ESEM: exploratory structural equations modeling), which may give a complementary or different perspective on the analysis performed herein 63.
Finally, we would like to add two final recommendations: first, the potential need to obtain an abbreviated version with generalizable and robust metric qualities and that guarantees the appropriate interpretation of scores. Indeed, with a balanced number of items representative of the core content of constructs measured by MBI-HSS, this goal is of scientific and practical value. Second, inferring the validity of MBI-HSS compromises the ethics of the researcher and the ineffectiveness of the use of an instrument intended to identify a problem requiring clinical and psychosocial care