Research done over the last years suggests that studying co-speech gestures could be very useful for understanding the relationship between language and thought (Goldin-Meadow & Alibali, 2013; McNeill, 2005). Gestures can change the mental representations of the speakers (Beilock & Goldin-Meadow, 2010) and can even influence the way we think. Gestures not only reflect the thought of the one who produces it, but can offer a feedback to the thought (Goldin-Meadow & Beilock, 2010). Gestures are actions, but at the same time, they represent information. In sum, they play an important role, as they add information to the mental representation and show concrete actions as abstract ideas (Goldin-Meadow, 2015). However, why do people make gestures while speaking? Why do people talk with their hands? Do these hand movements have any special significance or not?
Since the late 80’s, Susan Goldin-Meadow and colleagues have also shown that co-speech gestures play a role in facilitating the development of knowledge and indicate a state of transition towards this new knowledge. According to them, when solving a variety of problems, children and adults make gestures that contain different and complementary information from the one conveyed by speech. They call this phenomenon gesture-speech mismatch (Alibali & Goldin-Meadow, 1993; Church & Goldin-Meadow, 1986; Goldin-Meadow, 2011; Perry, Church, & Goldin-Meadow, 1988; Perry & Elder, 1997). The simultaneous activation of multiple ideas or strategies about the solution of a problem characterizes this state of transition of knowledge and is the cause of this gestures-speech mismatch. In fact, what happens is that gesture and speech convey two different ideas about the same problem: gesture conveys one idea, speech conveys another.
This is what Garber (1997) and Garber and Goldin-Meadow (2002) found in their research about the solution of the Tower of Hanoï problem. This is a puzzle in which a graduated tower of disks must be moved from a source peg to a goal peg. The largest disk is found on the bottom and the smallest on the top. There are two rules (Egan & Greeno, 1973): firstly, only one disk may be moved at a time and secondly, a larger disk must not be placed on top of a smaller one. The most efficient solution to the problem is made by repeating comparisons of the current state of the disk to intermediate and final goal states (Newell & Simon, 1972).
They found that when explaining the solution to the Tower of Hanoï problem (TOH), gesture-speech mismatches indicate that the problem-solver has in mind two different strategies for accomplishing this task: the first considered consciously through speech, the second less consciously through gesture. From here, they formulated the hypotheses that participants would produce more gesture-speech mismatches when explaining key moments or choice points of the solution to the problem (and not in other moments). This would indicate that at these choice points, participants must make a choice between two different possible strategies for an optimal solution of the task. These choice points, predetermined by the problem, become decision moments where the participant must carefully analyze their next decision in moving a disk. In other words, at those choice points participants have two ways: one that leads to solve the problem in the fewest number of moves (optimal solution), and another one in more moves (non-optimal solution).
Garber (1997) and Garber and Goldin-Meadow (2002) found other results which show that participants (children and adults) produced more gesture-speech mismatches at those choice points. This suggests that gestures-speech mismatches could be a “good” indicator of planning for the solution of the TOH problem. This is demonstrated by the fact that gesture-speech mismatches happen just at those choice points where participants have to make a choice between two different strategies and, to do so, they have to anticipate the resolution of the problem. They also found a link between the level of complexity of the task (represented by the number of disk) and the high production of gestures-speech mismatches. However, they did not find significant differences in the number of mismatches produced by participants; i.e. children and adults produced mismatches in equal proportion.
In the context of the findings of these researchers, two issues aroused our interest. The first one was the relationship between gestures-speech mismatches and planning, suggested by Garber (1997) and Garber and Goldin-Meadow (2002). The second one was their finding that there was no significant difference in the number of mismatches produced by children and adults.
On the other hand, planning has been studied from different theoretical approaches: behaviorist (Berger, Guilford, & Christensen, 1957; Miller, Galanter, & Pribram, 1960; Mumford, Schultz, & Van Doorn, 2001; Osburn & Mumford, 2006); cognitivist (Anderson, 1993; Newell & Simon, 1972; Reed, 1999; Richard, 1982, 1988, 1997, 2004; Rönnlund, Lövdén, & Lars-Göran, 2001); and finally, neuropsychological. Under this last approach, planning is part of the executive functions, executive control, or cognitive control (Anderson, 1998; Aran, 2011; Barceló, Lewis, & Moreno, 2006; Barroso-Martin & Leon-Carrion, 2002; Blaye & Chevalier, 2014; Chevalier, 2010; De Luca et al., 2003; Díaz et al., 2012; Diamond, 2013; Hughes & Graham, 2002; Lezak, 1995; Miyake et al., 2000). According to Diamond (2013), executive functions “refer to a family of top-down mental processes needed when you have to concentrate and pay attention, when going on automatic or relying on instinct or intuition would be ill-advised, insufficient, or impossible” (p. 136). She classifies planning (along with reasoning and problems solution) as a higher-order executive function. Therefore, this approach considers planning as anticipation of a sequence of different actions before they are implemented.
Richard (1982) claims that the TOH task is one of the most adequate problems of transformation of states for studying planning. The reason for this is that setting intermediate targets and searching for the required conditions for their accomplishment are critical stages in the process of solution of the task. Other studies come to the same conclusion (Anderson & Douglas, 2001; Aran, 2011; Byrnes & Spitz, 1979; De Luca et al., 2003; Díaz et al., 2012; Welsh, 1991). According to this research, it seems that the planning capacity needed for these types of tasks is rare before the age of seven and would only be possible by the age of eight. The literature shows that there is a fast development of this planning capacity between the ages of eight and nine, then a plateau moment between the ages of 9 and 12, followed by an important development between the ages of 12 and 14 and finally a plateau again (Blaye & Chevalier, 2014; Byrnes & Spitz, 1979; Chevalier, 2010; De Luca et al., 2003; Richard, 1982; Welsh, Penington, & Groisser, 1991).
More recent studies have used the TOH task in order to explore the relationship between gestures and problem solving. They try to demonstrate that gestures are not simple reproductions of actions already carried out (explaining the TOH task after having moved the disks) but that they are complex representations of problem solving (Beilock & Goldin-Meadow, 2010; Cook & Tanenhaus, 2009; Goldin-Meadow & Beilock; 2010; Trofatter, Kontra, Beilock, & Goldin-Meadow, 2015). Nevertheless, none of these studies has related the production of gesture-speech mismatches to planning through the resolution of this TOH problem.
As mentioned above, Garber (1997) and Garber and Goldin-Meadow (2002) used the TOH to analyze the gesture-speech mismatches produced by children and adults during the explanation of the solution of the TOH puzzle after they had done it. Our interest in the process of planning led us to focus on the role of gesture-speech mismatches in planning. That is why in our work the aim was to study the gesture-speech mismatches produced by children and adults while they explained the resolution of the TOH before they implemented it. We explored this through different ages chosen according to previous studies on the development of planning and according to the levels of complexity of the task (3 and 4 disk).
Method
Participants
Adults
Forty-eight college students (M = 19 years; range = 18 to 20 years), 24 males and 24 females, recruited from different colleges and enrolled in different schools (Engineering, Psychology, International Business, Medicine, Nursing, etc.) participated voluntarily and were interviewed after having signed the informed consent.
Adolescents
Forty-eight school pupils (M = 12 years; range = 12 to 14 years), 24 males and 24 females, were recruited from different schools. They participated in the task after obtaining informed consent from their parents.
Children
Forty-eight school pupils (M = 9 years; range = 8 to 10 years), 24 males and 24 females, were recruited from different schools, they participated in the task after obtaining informed consent from the parents.
All the participants live in a region of the Colombian Caribbean coast. None of them received compensation for their participation. Three conditions were required to include participants in this study: a) not presenting scholarly or academic difficulties; b) not presenting psychological and/or visual difficulties which could prevent them from taking the test; and c) not knowing the TOH task. These conditions were verified by the school in the case of the adolescents and children or by the adult students themselves.
Material and task
Device
Our material was the TOH puzzle. It was composed of a wooden flat base which measured 20 x 10 cm, three vertical rods which measured 8 cm (A, B, and C), and wooden disks of different colors (yellow, red, green, and blue). The smallest disk measured 3 cm in diameter. The other disks were increasingly bigger. These disks were called according to their sizes: disk 1, 2, 3, and 4. Disk 1 was the smallest and disk 3 was the biggest in the task version with 3 disks; disk 1 was the smallest and disk 4 the biggest in the task version with 4 disks.
Rules
The task consisted of moving the disk from a starting tower (Rod A) to a finishing tower (Rod C) through an intermediate tower (Rod B), respecting the following rules: (1) move only one disk at a time; (2) do not hold a disk in the hand nor put them on the table (the disk should always be on any of the rods), and (3) never put a bigger disk on a smaller one (Piaget, 1974; Welsh, Satterlee-Cartmell, & Stine, 1999). Figure 1 shows the space-problem diagram (Newell & Simon, 1972) of the task with four disks. The minimum number of required moves to solve any version of the TOH task in an optimal way was 2n-1; where n = number of disk (7 moves for the three-disk task and 15 for the four-disk task).
Procedure
Training phase
The training process began with two disks in the first rod. The experimenter explained the rules accompanying the explanation with hand gestures while pointing at the disk. After that, the experimenter asked participants to repeat the instructions and rules, and then, to explain how to solve the problem without moving the disk. Finally, the participant was asked to move the disk. After completion of the training task with two disks, the experimenter set the disk on the first rod and explained to participants that the game continues with the same procedure as in the task with two disks but this time a third disk and finally a fourth disk was added.
Development of the task
First, participants were invited to explain their task resolution without moving the disk, then asked to perform the task by moving disk. Here only the planning phase was analyzed rather than the movement of disk. The examples of Figures 2 and 3 illustrate, respectively, the explanation of the experimenter on the task with 3 disk (T3d) and early explanation of the participant performing the task with 3 disk (T3d):
To explain each problem (with three, then four disk), a type of paradigm “controlled” by the learner was chosen, i.e., there was no time limitation or imposition of test explanations on the task (Clément & Richard, 1997). However, after three unsuccessful attempts to explain the task, the difficulties of participants were considered and they were invited to perform the task. The maximum degree of complexity was measured by the four-disk task for all participants. At the end of every session, two questions were asked: “What do you think of the test?” and, “What strategies did you apply to solve the task?”
The planning phase was completed only when the participants had completed the task through their explanation, that is, when they had reconstructed the tower in their explanation (with three and four disk) in the same order as the last rod (rod C).
All participants were tested individually for a session of 15-20 minutes on average in an empty room made available by the school (meeting rooms, classrooms, library, conference room, music studio, or coordinating school office) in the case of children. With adults, sessions were held in a laboratory of psychology at the University. Both sessions with adults as with children were filmed entirely with the respective authorizations (following code of Ethics of Colombia).
Coding
144 footages were fully transcribed and coded according to the system described by Garber (1997), and Garber and Goldin-Meadow (2002).
Thus, a coder transcribed and coded first (a) the movement of disk described in speech, (b) the movement described in gestures, (c) the mismatches between gestures and speech, and d) the manual movement of disk. A gesture was defined as any movement of pointing gestures or iconic gestures representing the shape of the disk of one or two hands directed at the TOH, indicating the displacement of a disk from one rod to another. Private gestures or other hand movements, such as touching or taking a disk, scratching the head, touching the face, etc., were not considered a part of gestures associated with the path of a disk, and therefore, they were not coded (Ekman & Friesen, 1969; Garber & Goldin-Meadow, 2002). In this phase, the coding process aimed to: 1) determine the number of movements of “mental” disk arising from verbal explanations and gestural explanations to solve the task with three disk (3d) and four disk (4d). This allowed for the identification of the type of strategy used by participants. The strategy was optimal when the participant (through explanations of the verbal and gestural trajectorya ) solved the task in the lowest number of movements (7 for the task with 3d and 15 for the task with 4d); and it was non-optimal, when the participant solved the task in a larger number of movements (+7 and +/-15 for tasks with 3d and 4d, respectively). When the participants did a lower number of movements but which did not reach an optimal planning, i.e. they did not perform and end the task as expected, these were also considered as non-optimal strategies. For this, different trajectories of movements authorized in diagrams of the problem space (Newell & Simon, 1972) of the TOH were relied on. The second step was 2) to identify (for each movement and each verbal and gestural trajectory), the match and mismatch relationship between gestures and speech. Matches and mismatches were defined as stated in the article of Garber (1997) and Garber and Goldin-Meadow (2002), Figures 4 and 5 present examples of gestures-speech match and mismatches.
When explaining the task with three disks, a girl (Figure 4) described her first movement of the green disk from rod A to rod C, by saying: "I would take this [the green disk on rod A] and I would place it here [rod C]". At the same time, she accompanied her explanation by illustrating her verbal message and indicating with her gesture the rod A with her left hand, and the rod C with her right hand. In this case, both verbalization of movement 1 and her gesture referred to the same disk and the same rod. This was a typical case of match between the explanation of the first movement indicated verbally and the explanation of the non-verbal movement, indicated with gestures.
In contrast, Figure 5 illustrates one of the three cases of mismatch (Garber & Goldin-Meadow, 2002). The gesture transmitted a different trajectory that was not identified in the speech. For example, in the three-disk task, the participant described the first movement of the green disk from the first rod to the third rod, by saying: “I move the green disk to a rod” (the participant did not specify which rod), and then the participant said, “the rod C”, while pointing with a gesture the rod B, the rod of the middle.
In this phase, several attempts at explanation (verbal and gestural) were generated by participants prior to their final explanation. The last attempt to explain the tasks was coded and analyzed. However, the number of attempts produced by participants before the final attempt was recorded as the final attempt was not necessarily the one that led to the optimal resolution of the task. Finally, the explanation time in seconds i.e. the time between the end of the instruction given by the experimenter and the end of the last attempt at explanation by the participant was registered.
Reliability
A double coding was performed on 12.5 % of data (18 participants: 6 children, 6 adolescents and 6 adults). Reliability was established through a second evaluator who transcribed and encoded separately the following variables: average speech movements (NDV), average gestural movements (NDG), average gestures-speech matches/mismatches (MM) of the task of the TOH with 3d and 4d. This allowed us to determine a correlation coefficient between two evaluators: Kappa Cohen. Inter-rater agreement was determined by calculating the agreement proportion between the two encoders and the coefficient of Cohen's Kappa (K). In the 3-disk task, for adults, adolescents, and children agreement between coders was 1 for describing moves in speech (kappa = 1); 1 for describing moves in gesture (1), and 1 for describing gesture-speech matches and mismatches (1). In the 4-disk task, for adults, agreement between coders was 1 for describing moves in speech (kappa = 1); 1 for describing moves in gesture (1), and 1 for describing gesture–speech matches and mismatches (1). For adolescents, comparable numbers were: 77 (1); 77 (0.94); 77 (0.88).and for children, comparable numbers were: 80 (1); 80 (0.94); 80 (0.88).
The following hypotheses were posed:
An effect of age on mismatches depending on the type of strategy (optimal and non-optimal) was expected. The hypothesis that the older the age, the larger the number of optimal resolutions of the task, associated with a high frequency of “mismatches” was formulated. This would indicate the ability of participants to anticipate (both for gestures and speech) the chances of resolving the TOH.
Task complexity was expected to affect mismatches. It was expected here that the complexity of the task (TOH) had an effect on the production of gestures-speech mismatches in participant’s explanations when planning the TOH with 3 and 4 disks.
Results
An effect of age on mismatches depending on the type of strategy (optimal and non-optimal) was expected. The hypothesis that the older the age, the larger the number of optimal resolutions of the task, associated with a high frequency of “mismatches” was formulated. This would indicate the ability of participants to anticipate (both for gestures and speech) the chances of resolving the TOH.
Is there any effect of the age on production of mismatches?
We considered the number of gestures-speech mismatches produced by participants during their verbal and gestural explanations (according to the classification of Garber,1997 and Garber and Goldin-Meadow, 2002), and we wondered whether there was an effect of age on the production of these mismatches for the types of planning strategies of participants. A two-factor ANOVA with age as a between-subjects factor and type of planning strategy (optimal or non-optimal) as a within-subject factor was performed. The total number of mismatches was taken as a dependent variable. We found no effect of age on the production of gestures-speech mismatches considering the type of planning strategy in the task of 3 disks (F(2, 115) = 0.957; p > 0.387), nor in the task with 4 disks F(2, 117) = 0.049; p > 0.952). However, the analysis of variance showed a simple effect of type of strategy on production of gestures-speech mismatch in the task with 3 disks (F(1, 115) = 17.559, p < 0) as it is shown in Figure 6, but not in task with 4 disks (F(1, 117) = 0.052; p > 0.82).
Is there any effect of the complexity of the task on production of mismatches?
We expected here that the complexity of the TOH task would have an effect on the production of gestures-speech mismatches in participant’s explanations when planning the solution of the TOH with three and four disks. The analysis revealed a simple effect of the task complexity on the number of mismatches (F(1, 121) = 14.501, p < 0). The mean of gesture-speech mismatches was higher in four disk TOH planning than the one in three disks. Therefore, we confirm this hypothesis (cf. Figure 7).
To test this hypothesis in terms of age, an analysis of variance for repeated measures was performed. All subjects (8-10 years old children, 12-14 years old teenagers, and 18-20 years old adults) overcame the problem conditions with three and four disks. Our dependent variable was the average of gestures-speech mismatches, our within-subject variable, the complexity of the task, and our between-subjects variable, age. However, the analysis did not reveal any effect of the complexity of the task on the number of mismatches associated with age (F(2, 121) = 0.693; p > 0.502).
Discussion
This research focused on the role of gestures and particularly, the role of gestures-speech mismatches in the planning of the TOH task. In their research, Garber (1997) and Garber and Goldin-Meadow (2002) found that participants (children and adults) produced more gesture-speech mismatches at choice points while solving the task. This suggested that gestures-speech mismatches could be a “good” indicator of planning for the solution of the TOH problem. This was demonstrated by the fact that gesture-speech mismatches happen just at those choice points where participants have to make a choice between two different strategies and to do so they have to anticipate the resolution of the problem. They also found a link between the level of complexity of the task (represented by the number of disks) and the production of gestures-speech mismatches. However, they did not find significant differences in the number of mismatches produced by different kind of participants; i.e. children and adults produced mismatches in equal proportion.
From these findings, we conducted our research, assuming that just like the classic studies on the resolution and planning of the TOH problem, we could confirm the effects of age and complexity of the planning task through the study of gestures-speech mismatches. Our interest in the process of planning led us to focus on the role in planning of gesture-speech mismatches. That is why in our work the aim was to study the gesture-speech mismatches produced by children and adults while they explained the resolution of the TOH before they implemented it. We explored this through different ages chosen according to previous studies on the development of planning and according to the levels of complexity of the task (three and four disks).
According to this goal, we formed three age groups and presented the task with two levels of complexity. To date, no research (to our knowledge) had studied planning by using gestures-speech mismatches from a developmental perspective. Our research sought to fill this gap. We consider that the originality of our research consisted just in trying to find another way to study planning. We think that this complex process cannot be demonstrated only through the results of the solution of a task such as the TOH. Nevertheless, although our results did not led to totally confirm our hypotheses, they let to falsify some of them.
We confirmed, as other investigations did, that all participants produced mostly gestures to explain their TOH solution. This shows that gestures can also be studied in advance during planning of problem solving tasks and not just during explanations subsequent to its execution. In other words, we confirmed the power of gesture for explaining complex cognitive tasks. This has been demonstrated by recent studies (Alibali et al., 2014; Alibali, Church, Kita, Sotaro, & Hostetter, 2014; Chu & Kita, 2016; Kita, Alibali, & Chu, 2017). However, we did not find, as we expected, any relationship between planning development and production of mismatches at the choice points. Our participants, regardless of the age, did not produce significant number of mismatches at the choice points while using optimal strategies for solving the TOH task. Many of them could solve the three-disk task using an optimal strategy without producing any mismatch. This was not the case for the four-disk task. For this task, they produced more gesture-speech mismatches but age made no difference while complexity of the task did. This suggests that the three disks task is probably solved by participants automatically, without any cognitive control or planning. This could explain why they did not produce any gesture-speech mismatches. On the other hand, the four disks task, given its complexity, requires more cognitive organization and planning. In fact, we found significant differences concerning the production of gesture-speech mismatches when considering the complexity of the task.
Now, we might wonder if the fact that participants did not produce gestures-speech mismatches at the choice points is a sign of non-planning or if mismatches are not a good indicator of planning. At this point in the discussion and trying to answer these questions, it is important to explain that planning is related to other cognitive functions that we did not considered in this research: working memory and inhibition. According to Monette and Bigras (2008), planning is closely linked to these two cognitive functions since working memory is required in order to make a plan through the development of sub-vocalizations or mental images. Inhibition, on the other hand, is necessary because participants must often inhibit a dominant behavior in order to plan. Future studies on planning using mismatches, should take into account these cognitive functions to make a deeper analysis of the development of planning. What we suggest here is that a single indicator is not enough for evaluating planning.
Another explanation for the non-confirmation of our hypothesis of the relationship between the production of gestures-speech mismatches and the development of planning could be the protocol we followed in our research. Contrary to the one used by Garber (1997) and Garber and Goldin-Meadow (2002), and since we were interested in planning, we asked our participants to explain the task before moving the discs; a request that demanded from them an enormous capacity for abstraction. More recent studies with the TOH task and related to gestures have demonstrated, for example, that when confronted to two different situations, one using an actual TOH and the other a digital one, participants produced more gestures when solving the task with the actual TOH (Cook & Tanenhaus, 2009). In the case of the protocol used by Garber and Goldin-Meadow (2002), the fact that participants have had a real experience with the disks before explaining their resolution could favor the production of mismatches, therefore, the planning of the task. These interpretations remain limited because we do not have other studies about the relationship between gestures-speech mismatches and planning. Recently, studies have been developed with the TOH task and related to gestures, but not to the production of mismatches (Beilock & Goldin-Meadow, 2010; Cook & Tanenhaus, 2009; Goldin-Meadow & Beilock, 2010; Trofatter et al., 2014).
Nonetheless, because of the reasons we just explained, gestures-speech mismatches may not be a sufficient indicator for explaining the anticipated representation of the resolution of the task before moving the disks. However, they could predict the subsequent optimal resolution of the task; i.e. subsequent planning. We are currently analyzing our data to test whether participants who produced more mismatches before actually solving the task by moving the discs, had an effective subsequent planning. This would be a great step forward because we could thus talk about development and learning of planning through gestures. According to Novack & Goldin-Meadow (2017), because gestures are abstract representations and are not actions linked to events and particular objects, they can play a powerful role in thinking and learning beyond the particular, specifically in support of generalization and transfer of knowledge.