Introduction
Classroom management is a matter of concern among teachers everywhere. In fact, Gordon (as cited in Okutan, 2005), indicates that managing a classroom can be a critical challenge, especially for beginner teachers, but even for experienced ones. Being the first professional activity to be developed, classroom management is assumed as part of teachers' duties and one of their main responsibilities (Marzano, 2003).
Hence, it is of high relevance for both teachers and other stakeholders to identify the classroom management techniques that teachers use or are more likely to be used. This can be helpful for a number of reasons: 1) to be aware of the techniques teachers mostly tend to use; 2) to identify patterns of behavior; 3) to find out which ones are more effective; 4) to identify teachers' beliefs behind their actions inside and outside the classroom; and, one of the most relevant ones, 5) to enable pedagogical reflection by making teachers aware of their teaching process in order to identify weaknesses and strengths, as well as possible modifications of their practices. In fact, Martin, Schafer, McClowry, Emmer, Brekelmans, Mainhard, and Wubbels (2016) concluded that classroom management is a powerful component of the overall classroom climate that affects students' behavior, engagement, and, by extension, the quality of students' learning.
One of the main tools that contribute to the effective investigation of classroom management is questionnaires. Researchers use them widely and frequently to collect relevant data with the purpose of reaching and supporting their findings. It is important then, to count on reliable and valid questionnaires, which reflect on current views about classroom management. While examining old and new instruments utilized for different types of research on classroom management practices and teachers' beliefs behind those practices, it was uncovered that, even though there exists an array of them, one of their weaknesses is they are outdated. For example, most of the instruments do not include current topics like social networks, parental involvement, or new findings and understandings of the topic. Indeed, a number of questionnaires that do include recent views on classroom management are addressed specifically to teachers who teach kids, not to English teachers in general. Therefore, there are no recent instruments that deal with classroom management techniques used specifically by English teachers, making the creation of a questionnaire on this area paramount.
This study aims at approaching the validation of a CMQ using two member-checking techniques (Delphi and Fleiss' Kappa) and at estimating the CMQ internal consistency, using Cronbach's alpha. This paper is part of the Fondecyt 1150889 research grant, "Las dimensiones cognitivas, afectivas y sociales del proceso de planificación de aula y su relación con los desempeños pedagógicos en estudiantes de práctica profesional y profesores nóveles de pedagogía en inglés."
Theoretical Framework
The concept of classroom management has been widely defined, and every author explains it from a different perspective. According to Özcan (2017), "classroom management is an ongoing interaction between teachers and their students" (p. 111). Consequently, the concept can be understood as all the actions performed by the teacher to create and maintain a learning environment that enables successful instruction. This includes a variety of techniques, like arranging the physical environment, establishing rules and procedures, maintaining students' attention to lessons, and engagement in activities (Özcan, 2017).
Classroom management has also been defined as the actions teachers take to create a supportive environment for the academic and social emotional learning of students (Özcan, 2017). Therefore, classroom management can be seen as all the actions that a teacher performs inside a school in order to enable learning. Classroom management can be thus conceived as all the educational decisions teachers make (Marzano, Marzano, & Pickering, 2003).
Historical Review of Classroom Management Questionnaires
The first attempt to measure classroom management practices was made by Willower, Eidell, and Hoy (1967) with the Pupil Control Ideology (PCI) scale. The PCI form, as described by Hoy (2001), is a 20-item Likert-type scale with 5 response categories for each item ranging from "strongly agree" to "strongly disagree." This inventory is based on an ideological continuum going from custodial (more controlling; teacher does not attempt to understand student's misbehavior) to humanistic (less controlling; teacher believes student can learn to be a self-regulating individual).
Later on, Wolfgang and Glickman (1986) conceived another framework to explain teachers' beliefs toward classroom management. This framework was the basis for the Beliefs on Discipline Inventory (BDI). It consists of three parts: prediction items (3 questions), forced choice items (12 questions) and self-scoring and interpretation (3 steps). This last part includes comparing results of the forced choice part with the predictions made in part 1. Similarly to the PCI form, it is based on a teacher-student control continuum, which illustrates three approaches to classroom interaction: non-interventionists (low teacher control-high student control), interventionists (high teacher control-low student control), and interactionalists (equal teacher control-equal student control).
In 1993, Nancy Martin and Beatrice Baldwin presented a new questionnaire based on both of the premises previously described, the Pupil Control Ideology form and Beliefs on Discipline Inventory. It was called the Inventory of Classroom Management Style (ICMS). It used the same BDI's continuum from a most non-interventionist approach to a most interventionist approach with a mid-point (interactionalist approach).
The ICMS has 48 Likert-type items and the idea of its format was taken from the PCI questionnaire, but with different descriptors. The novelty of this instrument, unlike its predecessors, was the holistic point of view regarding classroom management, grouping items into three dimensions: person, instruction and discipline. The focus was removed from discipline, considering classroom management as "a multi-faceted construct [...] a broad, umbrella term that includes, but is not limited to, discipline concerns" (Martin & Baldwin, 1993, p. 4).
Then, one year later, Nault (1994) created an inventory called Questionnaire on Classroom Management in Early Childhood Education (QCME) addressed specifically to teachers who teach young children. It is formed by 100 items distributed unequally within four dimensions related to planning, organization, intervention, and evaluation.
More than a decade later, Pearson Education Canada Inc. (2005) launched an updated version of Beliefs on Discipline Inventory with a quite similar name: Beliefs about Discipline Inventory. This questionnaire does not present the three parts that its predecessor (BDI) had, but only one section that resembles part 2 of the earlier version of the inventory, keeping just the part of forced choices, rewording the same 12 statements with dichotomous answer (a or b).
More modern inventories include the one developed by Webster-Stratton (2012) to assess teachers' performance when applying a training program with young children. The Teacher Classroom Management Strategies Questionnaire has four sections with different scales for each one. It has very specific and comprehensive items intended to find out the usefulness and frequency of use of a variety of classroom management techniques, supposedly applied by teachers who are taking the course, especially those related to discipline, work with parents and planning. The most recent instrument found is the one developed by Award (2016); it is a simple 14-item questionnaire to measure teachers' views on their classroom management competencies and their views on the quality of their pre-service training and the in-service support from their schools with a Likert-type scale.
Appraisal of Classroom Management Instruments
One of the weaknesses detected in some of the inventories analyzed is the language used to formulate the items. Let us take the case of the PCI form. There are just 2 out of the 20 items, which convey a positive sense when reading it. The remaining 18 items convey a quite negative message when referring to student misbehavior, persistently highlighting discipline and order, which obviously would not depict current views on classroom management. It may be evident enough for teachers what responses are expected from them, even though the questionnaire is anonymous, which may lead to unreliable answers.
Like the PCI form, the BDI is highly focused on disciplinary aspects without taking into account that interaction with students implies a lot more than just that area. Noteworthy is the inventory's layout, especially part 2, where dichotomous statements force teachers to decide between two extreme views, leaving no room for intermediate positions. Another flaw is the absence of categories or dimensions. In some instruments there is not a guiding or logical thread within items. Meanwhile, inventories that do include these aspects do not have items organized into categories or dimensions, resulting in mixed questions, which seem disconnected, as loose statements referring almost entirely to discipline aspects, leaving aside other important areas of classroom management. This is especially true in the case of instruments with few questions. Some of the modern instruments described, which make a contribution adding more characteristics than just the discipline area, suffer from being either too long, as it is the case of Nault's (1994) QCME inventory with 100 items, or too short, the most recent questionnaire found developed by Awad (2016) with just 14 items.
Life at school involves a variety of aspects. Classroom management, as has already been said, involves almost all teachers' actions. Taking into account the historical background reviewed, it has been made evident the need of a new instrument that better depicts our times and the current understanding of the classroom management construct.
Methodology
Research Design
This is a non-experimental and descriptive study. It is also cross-sectional, because the data was collected in one specific period of time.
Research Participants
An early version of the Classroom Management Questionnaire was given to a review board of language experts, to be rated in order to evaluate its validity. The review board was formed by 12 experts in the field. Of the 12 expert participants, 8 were women, representing 67% of the total.
The Classroom Management Questionnaire was applied to 31 English teachers, 81% of whom were between the ages of 21 and 30 years old. Out of the 31 participants, 24 were women, representing 77% of the total. Additionally, most of the participants who answered the cmq taught in secondary public school education with a few participants working in two different school levels.
Instrument
The questionnaire had a Likert-type modality going from "Rarely" to "Usually," and 60 items were distributed equally within three main dimensions: discipline, teaching and learning, and personal. Each dimension was made up of 20 items. The items were mainly adapted from different sources: questionnaires addressed to teachers who teach young learners and classroom management books. Below are the four sources used in the design of the CMQ:
». Questionnaire on Classroom Management in Early Childhood Education (QCME) (Nault, 1994)
». Teacher Classroom Management Strategies Questionnaire (Webster-Stratton, 2012)
». A Handbook for Classroom Management that Works (Marzano, Foseid, Foseid, Gaddy, & Marzano, 2005)
». Classroom Management Techniques (Scrivener, 2012)
Some of the items, which were exclusively applicable to young learners, were adapted and reworded to make them more general. Likewise, some items were created on the basis of the introduction of new technologies into the classroom, such as the Internet and social networks.
Type of Statistical Analysis
Validity and reliability are two fundamental elements in the validation of a questionnaire. Validity is the extent to which an instrument measures what it is intended to measure. Reliability is intended to test the overall consistency of an instrument (Tavakol & Dennick, 2011). For this study three statistical techniques were used and are briefly described below.
Delphi Technique
The Delphi technique is a widely used and accepted method for gathering data from respondents within their domain of expertise. Basically, consensus on a topic can be reached if a certain percentage of the votes fall within a specific range. The use of mean scores, based on a Likert-type scale, is strongly favored. The mean appears to be inherently best suited to reflect the resultant convergence of opinion. It has been suggested that the mean has to be at 3.25 or higher to reach a consensus on a certain topic (Hsu & Sandford, 2007).
Fleiss' Kappa Technique
Fleiss' Kappa evaluates the concordance or agreement between multiple raters. It is a measure of the degree of agreement that can be expected above chance. Agreement can be thought of as follows: If a fixed number of people assign numerical ratings to a number of items, then the Kappa will give a measure for how consistent the ratings are. Table 1 describes the benchmark scale that Landis and Koch (1977) proposed, one of the most widely used benchmark scales to value the degree of agreement between raters in function of Kappa.
Cronbach's Alpha
This technique is a measure of internal consistency of tests or questionnaires 270 in order to validate their reliability. It is commonly used in questionnaires with multiple Likert questions whose answers are neither correct nor incorrect, but each participant chooses the alternative that best depicts his or her own views on the construct intended to explore. Internal consistency refers to the extent to which a set of items in a questionnaire measures the same concept or construct that he or she intends to measure and, therefore, it is connected to the inter-relatedness of the items within the test. If the items in a questionnaire are correlated to each other, the alpha value is increased. These values range between 0 and 1, in which 0 means "no reliability at all" and 1 means "total reliability." The closer the alpha value is to 1, the higher the inventory's reliability. Table 2 represents the values proposed by George and Mallery (2003).
Data Analysis and Discussion
Specific objective 1
To validate the CMQ using two member checking techniques (Delphi and Fleiss' Kappa).
Delphi Technique Applied to the cmq
The instrument was evaluated by a total of 12 language experts who rated the clarity, coherence, and relevance of each one of the statements from one to four points in a Likert-type scale. Each classification is understood as follows:
». Clarity. The item is easily understood, that is, its syntax and semantic are appropriate.
». Coherence. The item shows a logic relationship with the aim or indicator it is measuring.
». Relevance. The item is essential or important, that is, it has to be included in the instrument.
The statements were assessed by the raters under the following categories: 1) does not meet the criterion; 2) low level; 3) moderate level; and 4) high level, as shown in Table 3 below.
The instrument was separated into three dimensions-discipline, teaching and learning, and personal-in order to analyze it through the Delphi technique.
Discipline dimension analysis
As stated previously, the suggested mean for an item to be accepted as appropriate (clear, coherent, relevant) is 3.25 or higher. Therefore, every item was considered as appropriate by the specialized subjects, with the exception of items 15, 16, and 20. These three items measured under the suggested mean score had to be revised in order to fulfill the characteristics of a properly written item. Additionally, items 16 and 20 were relocated to enhance the coherence among items within this dimension in order to be logical as the participant was reading the questionnaire. Items that narrowly surpassed the suggested mean score were also rewritten, as in the case of item 6.
Table 4 shows the lowest mean score of the answers provided by the participants for the items belonging to the Discipline dimension.
The changes made followed the comments and suggestions given by the specialized raters. Most of them suggested writing the pronoun "I" before every item instead of having the pronoun in the introductory statement at the beginning of each dimension, as it was in the first version of the CMQ. Table 5 shows the changes made to the mentioned items and how the revised statement was rewritten.
Teaching and Learning dimension analysis
Table 6 shows the lowest mean score of the answers provided by the subjects for the items belonging to the Teaching and Learning dimension. In this case, specialized participants considered every item as appropriate, with the exception of item 39. The item, which was measured under the mean score suggested in terms of clarity, had to be revised in order to fulfill the characteristics of a properly written item. The changes made, following the comments and suggestions given by specialized participants, are shown in Table 7.
Personal dimension analysis
In the case of the Personal dimension, the specialized participants considered every item as appropriate. Therefore, no changes were made to any of the items belonging to this section.
Fleiss' Kappa applied to the cmq
The instrument was evaluated by a total of 12 experts who rated the clarity, coherence, and relevance of each one of the statements from one to four points in a Likert-type scale. The statements were classified by the raters under the following categories: 1) does not meet the criterion; 2) low level; 3) moderate level; and 4) high level. The instrument was separated into three dimensions: Discipline, Teaching and Learning, and Personal in order to analyze it with the Fleiss's Kappa coefficient.
Discipline dimension analysis
Table 8 shows items with the lowest Fleiss' Kappa coefficient in the Discipline dimension. As stated above, a Kappa value between 0.41 and 0.60 indicates a moderate agreement level, while ranges of values (0.61 to 0.80) and (0.81 to 1.00) indicate substantial and almost perfect agreement levels respectively. Therefore, according to the Fleiss' Kappa Coefficient applied to the instrument, there is a moderate agreement, a substantial agreement or an almost perfect agreement among raters in every item, with the exception of items 6, 15, 16, and 20, which were rated with a fair agreement among experts.
Teaching and Learning dimension analysis
In this dimension, there is a moderate agreement, a substantial agreement or an almost perfect agreement among raters in every item, with the exception of items 38 and 39, with a fair agreement among experts, as shown in Table 9.
Personal dimension analysis
In the case of the Personal dimension, there is a moderate agreement, a substantial agreement or an almost perfect agreement among raters in almost every item. Table 10 shows that there is just one exception in item 49, where the agreement level among raters is considered as fair.
Cronbach's alpha applied to the cmq
The data collected was computed using the SPSS Statistics program created by IBM. As stated before, a Cronbach's alpha value higher than 0.90 indicates an excellent internal consistency level, while values ranging between 0.90 and 0.80 indicate a good level of internal consistency. The reliability statistics yielded a Cronbach's alpha of 0.904 in the instrument as a whole, which indicates that the questionnaire has an excellent internal consistency, and it is, therefore, highly reliable.
The instrument was also analyzed with the Cronbach's alpha technique separately into its three dimensions: Discipline, Teaching and Learning, and Personal.
After analyzing the 20 items that formed the Discipline dimension, the results represent what is considered as a good Cronbach coefficient, with a .811 of Cronbach's alpha value.
When the Cronbach's alpha was calculated considering the 20 items forming the Teaching and Learning dimension, the results showed a quite good value, with a .860 value.
After calculating the Cronbach's alpha corresponding to the Personal dimension, we obtained a .884 value that is considered as a good value for the dimension.
Some interesting values were found while performing the item-per-item analysis. According to Gliem and Gliem (2003), the minimum score for an item to be considered correlated with the total test score is between 3.5 and 4. The values below this score have a low level of correlation. According to the results obtained and shown in Table 15, the correlation item-test works well in general terms. However, there is a significant number of items that are below the minimum value of 3.5 (Gliem & Gliem, 2003), to be considered correlated with the total test score. Items below 3.5 are 4, 6, 7, 8, 9, 12, 16, 17, 19, 20, 26, 32, 33, 35, 36, 37, 38, 41, 42, 44, 52, 54, and 58.
It is worth noting that the last column of Table 15 shows the Cronbach's alpha value if the low-value items are deleted. Only removing items 6, 7, 8, 9, 33, and 37, the alpha coefficient would increase significantly. Such is the case of item 8, which, if deleted, the Cronbach's alpha would increase to 0.912. As seen, only removing 6 out of 60 items, the alpha coefficient would increase at some degree. Nonetheless, that does not mean that these items should be deleted. One of the factors that may have influenced these figures is the fact that research participants were mainly novice teachers with beginning teaching experience. This is evidenced especially in classroom management techniques that have to do with the Discipline dimension, where the lower ranges are obtained.
Conclusions
The purpose of this study was to approach the validation of a questionnaire to identify classroom management techniques used by pre- and in-service teachers of English. The Delphi and Fleiss' Kappa techniques were used to approach the validation of the CMQ. The Cronbach's alpha technique was used to comply with the estimation of the CMQ reliability through its internal consistency coefficient.
Once the two member-checking techniques, Delphi and Fleiss' Kappa, were applied, and after a review board of 12 expert raters evaluated the questionnaire, it was concluded that almost every item of the questionnaire was considered appropriate by the raters in terms of clarity, coherence and relevance, with the exception of items 15, 16, 20, and 39, with only 4 out of 60 items. Consequently, such items were properly rewritten and/or relocated. Once the Fleiss' Kappa technique was applied, it was concluded that there was either a moderate agreement, a substantial agreement or an almost perfect agreement between raters, with the exception of items 6, 15, 16, and 20 referring to the Discipline dimension; items 38 and 39 belonging to the Teaching and Learning dimension, and item 49 from the Personal dimension. In total, 7 out of 60 items in which raters reached a fair agreement regarding the clarity, coherence and relevance of such items. After applying these two member-checking techniques, the conclusion is that this research objective was achieved. The item modifications suggested were made, and a revised version of the Classroom Management Questionnaire was obtained.
The other research objective was to estimate the CMQ internal consistency using the Cronbach's alpha technique, which was applied after the questionnaire was answered by 31 English teachers and teachers to-be who participated in the study. Once Cronbach's alpha results were obtained, it was concluded that, overall, the questionnaire had an excellent internal consistency and it was, therefore, highly reliable. The item-per-item analysis revealed that a significant number of items did not have a good level of correlation from the total score. However, that does not mean that those items should be deleted. Only removing items 6, 7, 8, 9, 33 and 37, that is, 6 out of 60, the alpha coefficient would increase at some degree. One of the factors that influenced these figures was the fact that research participants were mainly novice teachers with little teaching experience, especially using classroom management techniques that have to do with the Discipline dimension.