Introduction
Human communication is a complex process that can be considered from various perspectives. It depends on several factors, one of which is the interaction between speech perception and speech production. The perceptual-motor relationship is inextricably integrated not only in speech tasks but also in perception, auditory system, cognition, and language, among others [1-3]. One of the critical aspects of motor speech production is phonation, which plays an essential role in the listener’s speech perception and auditory self-voice feedback received as an individual speaks [4].
From the listeners’ perspective, voiced sound carries selective spectral modification comprised from articulatory gestures of the vocal tract, resulting in a signal containing harmonic energy in a wide range of frequencies, covering at least the vocal tract’s first acoustic resonances [5]. Furthermore, vowel formant frequencies and transitions (change in formant frequency of a vowel immediately before or after a consonant) can affect the interpretation of vowels and the adjacent consonant, generating the perception of a word [6-11].
Regarding the speaker, auditory self-voice feedback plays a unique role. Once vocalization is initiated, auditory feedback monitors possible acoustic changes that may occur during speech, allowing control over the speaker's vocal and articulatory output [12,13]. The above are examples of the inherent interaction between speech production and speech perception in both speaker and listener. Nonetheless, this phenomenon of interaction between auditory feedback and voice has not yet been incorporated into the routine clinical evaluation of voice problems.
In general terms, a comprehensive voice assessment is based on information from numerous sources, including acoustics, aerodynamics, endoscopy of the larynx, clinical judgment of vocal quality, and the patient’s self-perception of their voice in terms of its quality and impact on their life [14]. Just in recent years, the important role of auditory feedback in voice production has begun to be described for both voice assessment and voice therapy; however, some aspects related to auditory-vocal integration are still not widely known by voice clinicians. Because of the above, this reflection article has two purposes: 1) to highlight the important link between voice production and voice perception and 2) to consider whether this relationship might be exploited clinically for diagnostic purposes and therapeutic benefit. Existing theories on speech production and its interaction with auditory perception provide context for discussing why the evaluation of auditory-vocal processes could help identify associated origins of dysphonia and inform the clinician around appropriate management strategies.
Speech Perception and Production
Voicing is the primary outcome of the process of speech production. Spoken utterances are then perceived by both the listener and the speaker. The role of auditory input on speech and voice production can be considered from at least two perspectives. First, it is recognized that auditory signals external to the speaker impact how a person produces their voice. The Lombard effect is an example of an external auditory signal that can cause individuals to increase their loudness involuntarily [15]. Also, the Lombard effect causes acoustic and phonetic modifications, including an increase in the fundamental frequency (f o ), a change in the first (F 1 ) and second formants (F 2 ), and an increase in vowel duration [16-18]. Research involving people with Parkinson’s disease (PD) has shown that the Lombard effect could even positively affect voice therapy. By altering the subjects’ auditory feedback, the person with PD increased their f o , voice intensity, and stability [19]. A second type of auditory input occurs when a speaker perceives their own voice in near-real-time (self-voice feedback). It is this type of feedback that we are most closely considering within this reflection.
Dating back several decades, researchers have conducted auditory-feedback perturbation studies, shedding light on the role of auditory feedback on voice control. In general, auditory perturbation studies involve altering some aspect of the acoustic signal (e.g., vowel formant) that a person is producing and presenting this in near real-time to the same speaker to see if or how they adjust the speech production [20]. As an example, research has verified that when the f o of one’s own voice is modified and presented to the individual during or before vocalization, a compensatory response engages, in which the person adjusts the intended target f o to match the f o -adjusted stimulus, evidencing that f o responds dynamically to auditory self-voice feedback [21].
Classical models of speech and language production incorporate perception and production as components in their structural features, such as the Broca-Wernicke-Lichtheim model [22]. In recent decades, scientific evidence, based on neuroimaging studies, has been collected. This evidence shows a cortical and subcortical connection related to the self-perception of speech, processing, and language production [23]. New neurocomputational models incorporate interactive networks, or streams, in their structure, allowing for a better understanding of the interaction between perception and speech production and relating comprehension and production processes to ventral and dorsal regions of the brain [24,25].
We can mention models that attempt to explain this phenomenon, to relate some of their components to voice control and production: the Directions Into Velocities of Articulators (DIVA), the State Feedback Controller (SFC), and recently the SimpleDIVA models [24,26,27].
The DIVA and SFC models are neurocomputational models to describe the speech production process [24]. These models emphasize the role of auditory and sensorimotor feedback in planning speech motor responses. In addition, these incorporate anatomical labels of different brain regions at each stage, or map, which are connected by synaptic projection [24,28,29]. Specifically, the DIVA and SFC models' systems both include auditory feedback that allows detection of speech errors, so that corrections can be attempted and a desired speech motor response is generated [29,30]. The model encodes movement velocities for the lips, tongue, jaw, and larynx [24,28,29]. The difference is that DIVA relies primarily on feedforward controls, whereas SCF integrates internal predictions through efference copies, allowing for an increased gain during vocalization [26,31].
We focus on the recent proposal of Kearney and colleagues who described a simplified version of the DIVA model, the SimpleDIVA, for being specific in voice self-feedback[27]. This model is a three-parameter mathematical model that quantifies the associated three subsystems involved in speech control: auditory feedback control, somatosensory feedback control, and feedforward control mechanisms to sensorimotor adaptation. In this model, the feedforward controller consists of stored motor sequences updated based on sensory errors. Detection of sensory errors occurs from an auditory feedback control component in the model that essentially compares the planned motor speech output from the feedforward controller with the speaker’s auditory signal. Similarly, a somatosensory feedback control component is part of the model engaged when somatosensory feedback from articulators detects errors compared to the planned motor output. In this manner, sensory feedback is used to make near-real-time adjustments to output via error detection. SimpleDIVA offers a new understanding of speech and voice control, through a phenomenological explanation for the behavioral responses to the adaptation paradigm challenging to interpret from behavioral data alone. As the authors state, the SimpleDIVA can better understand sensorimotor learning and control differences between normal and disordered groups of speakers, which could ultimately identify new or more refined interventions for those with communication disorders [27].
Contemporary speech motor control models include components within their structure that help explain different types of auditory-vocal disorders and the relationship to auditory integration, evidencing underlying mechanisms of sensorimotor-based communication disorders. A valuable means of studying these issues involves a sensorimotor adaptation paradigm [27]. In this paradigm, a perturbation of the speaker’s auditory feedback is created through the modification in real-time of formants or f o of the speaker during the repeated production of a series of words. Generally, auditory feedback is altered in three different ways during the repetition of utterances after baseline recordings, without auditory feedback manipulation: “ramp” in which a parameter is shifted incrementally over time; “full-shift” or “hold”, where a parameter is abruptly altered and held for a time; and “post-shift” or “after-effect”, where the alteration is removed [28,32,33].
Evidence of Auditory-vocal Impairment and Behavioral Voice Disorders
Different classification systems have been proposed for voice disorders. Still, one common designation is the division into functional and organic disorders, with subcategories for functional voice disorders, often referred to as hypofunctional and hyperfunctional disorders. The hyperfunctional category relates to laryngeal muscle strain and ineffective or inadequate phonatory behavior [34]. Hyperfunctional voice disorders are common and have been associated with other diagnoses, such as phonoatruamatic and benign lesions of the free edge of the vocal folds [35]. While studies identify a wide range of clinical symptoms and biomechanical and laryngeal configurations, there is an incomplete understanding of the cause(s) of hyperfunctional voice disorders, which have been linked to poor vocal hygiene, aberrant or excessive use of the voice, and psychological and personality factors [32]. Stepp and colleagues, who evaluated auditory-vocal integration impairment in people with diagnosed hyperfunctional voice disorders, hypothesized that such impairment may contribute to developing and maintaining these behavioral voice disorders [32,36]. This hypothesis arose from observing the auditory-vocal integration impairment in subjects with hearing loss. Individuals with hearing loss have some voice characteristics similar to those with hyperfunctional dysphonias such as high glottal resistance, increased phonatory effort, and voice quality changes like strain and breathiness [32,37].
In normal conditions of auditory-vocal integration, by exposing a person to an increase in their own f o (feedback), the expected adaptative response is a decrease in the f o of the subject’s own voice, that is, subjects shift their pitch in the opposite direction to the auditory stimulus as a compensatory response [33]. The brain seeks to predict and recapitulate representations that best adapt to external stimuli and sources, creating advanced predictive models with sensory information to minimize error relative to the intended production [38]. Adaptive responses are influenced by interactions between the feedforward and feedback control systems and are seen when feedback is consistently perturbed [30,39].
Utilizing the sensorimotor adaptation paradigm, Stepp found that the subjects with hyperfunctional voice disorders did not show a typical adaptative response, i.e., when f o increased, speakers responded by further increasing their f o . The authors interpreted these results as evidence that some people with voice disorder have an auditory-vocal integration impairment, resulting from a deficit between feedforward voice control and auditory feedback. Thus, the presence of auditory-vocal disorder could explain the occurrence and persistence over time of hyperfunctional vocal behaviors [32], an aspect that the SimpleDIVA model could also explain, regarding a deficit in the correction and adaptation of ongoing vocal production due to errors in auditory feedback [27].
Stepp’s study contributes to understanding how, for example, an initial change in a person’s voice, after phonotraumatic behaviors or an infection of the upper airway, may result in prolonged changes in voice production that can become chronic. An altered voice quality received as feedback from an individuals’ own voice, continually altering the feedforward responses of the system, could also help explain why some interventions are not successful for specific individuals. A recent article focused on people with benign vocal fold lesions also implicated auditory-vocal feedback impairments as a factor in developing that specific voice disorder [40]. Lee's study incorporated a group of participants with nodules, polyps, and cysts of the vocal folds and non-dysphonic subjects. The participants were asked to produce a sustained vowel under different auditory feedback conditions in real-time. Unlike the classic sensory adaptation experiment, the auditory feedback modifications consisted of integrating a background noise and enhanced feedback of a self-produced voice. Lee et al. found that low-frequency modulations (below 3 Hz) of vocal f o of a sustained vowel were significantly high for subjects with vocal fold nodules over the other groups. The authors interpreted the results as supportive of the possibility that vocal fold nodules and their vocal behaviors may be associated with abnormal auditory-vocal feedback integration [40].
Improving Auditory Feedback as a Therapeutic Approach.
Emerging literature regarding altered sensorimotor integration relating to voice production suggests an intriguing possibility: targeting auditory-vocal feedback control processes might be a helpful component of therapeutic interventions, which means voice production might be improved, in part, through manipulation of sensory mechanisms or auditory-feedback [40,41]. However, as auditory-vocal integration is just beginning to be studied as a possible cause and prevalence of some types of dysphonia, there are still many research questions to be addressed regarding the evaluation and treatment of voice impairment.
Some authors have already started exploring the impact of devices and other interventions to alter or improve auditory feedback on the voice. A recently published study addressed whether auditory feedback control of vocal pitch production in subjects with PD could benefit from Lee Silverman voice treatment (LSVT® LOUD) [42]. LSVT LOUD is an intensive voice treatment program that aims to increase voice intensity in people with hypokinetic dysarthria through a sensorimotor recalibration of increased vocal loudness [43,44]. Li's study demonstrated the positive effects of LSVT LOUD on auditory-vocal integration in people with PD [42]. After LSVT LOUD, subjects showed compensatory responses to auditory feedback similar to the performance of healthy subjects. Additionally, significantly greater EEG cortical responses (P2) were observed in response to pitch perturbations after LSVT LOUD, reflecting the intervention's possible top-down modulatory effect on auditory-motor integration for voice regulation in the PD subjects [42].
In addition, it is important to mention that it has been shown that the learning of speech motor sequences is not only based in areas of the brain classically related to learning, but also in those associated with auditory and somatosensory feedback-based speech motor learning and the network of brain regions that participate in both motor and sensory processes [45]. All the above leads us to wonder whether the intensive nature of some voice therapies with a high number of vocal motor task repetitions and consistently used stimuli could conceivably improve feedforward phonatory performance, which could also be explained by the DIVA and SimpleDIVA model.
Other tools that could favor therapeutic use of auditory feedback include, for example, the Escera-assessed device called Forbrain® (Sound For Life Ltd/Soundev, Luxemburg, model UN38.3) as an Auditory Altered Feedback (AAF) device by evaluating changes in voice quality-related acoustic measures such as smoothed cepstral peak prominence (CPPS) and long-term average spectrum (LTAS) [41]. The device allows the users to have real-time improved auditory feedback through bone conduction and amplification of the high or low speech frequencies. The results indicated that the Forbrain® altered the voice signal in the manner described by the manufacturer. However, the AAF feedback had some paradoxical results. The values of the trendline of the LTAS were consistent with improved voice quality. Still, the values of the CPPS, a measure associated with voice quality, decreased, which is associated with worsened quality. The author states that this effect may be due to a typical response to AAF devices, where motor feedforward is altered as a consequence of motor adaptation to improve auditory feedback; conversely, motor output is more accurately adjusted when there is altered feedback. These results can be taken as a research opportunity to test this kind of device by setting different types of auditory feedback perturbation. If beneficial, these tools could be of great utility for voice rehabilitation processes and research due to their ease of implementation and design, which allows performing ecological studies.
Another interesting fact related to tools that modify auditory feedback is an additional result of Lee's study mentioned above. Bone conduction feedback of the self-produced voice significantly reduced the low‐frequency modulations of vocal f o of a sustained vowel. From this result, the authors raise the reasonability of such an auditory feedback aid being incorporated as a therapeutic modality for vocal folds nodules [40]. Following the same dynamic, these studies in subjects with voice disorders and auditory-vocal impairment could be an excellent opportunity to assess this type of device's usefulness in a population that would probably benefit the most.
Final Considerations
The production of voice and speech is a complex process that requires interaction with auditory perception. Sensorimotor adaptation provides another avenue to consider in understanding and treating individuals with voice disorders: assessing and manipulating a person’s capabilities relative to vocal motor control. Comprehending key and current aspects linked to speech perception and its disorders opens the door to a broader view on understanding the process of human voice production. It is beneficial for a voice researcher and clinician to advance knowledge of the neurobiological mechanisms that support speech and voice perception and how production is shaped by sensory experience (i.e., auditory and somatosensory). This understanding can lead to novel ways to assess and treat a person who has a voice disorder. Therefore, understanding voice production requires an integrated approach [5], where physiology, acoustics, biomechanics, and neurological processes must be considered holistically and not in isolation. Part of an integrated approach involves determining how voice self-perception and production are related.
Emerging work establishing that auditory-vocal impairment is often present in those with functional dysphonia is an important step that may eventually influence the diagnostic and therapeutic voice practice [32,40]. An impairment in auditory perception could impact feedforward processes of voicing and subsequently impact the recovery process after acute dysphonia. Incorporating auditory-vocal integration assessment through sensorimotor adaptation paradigm testing could eventually prove to be an important addition to voice evaluation protocols. Further, suppose areas of improvement within a person’s auditory-vocal integration can be identified. In that case, voice therapy efficacy and efficiency could be increased, leading to improved quality of life and possibly reduced health-related costs. Moreover, DIVA models suggest that motor output changes may become more long-term by persisting on the integration of auditory feedback within voice therapy. One of the challenges for the future is to take advantage of such information and consider how auditory and somatosensory feedback modifications in subjects with auditory-vocal impairments can be assessed and manipulated. These models can provide important information about the complex and multifactorial nature of the voice production process, which clearly is linked to a person’s auditory and somatosensory voice perception.
There are still many challenges about the relationship between voice and auditory feedback. An alteration in the integration between auditory feedback and voice production appears to be a potentially important issue for some people who have a voice disorder. However, the best ways to identify and characterize how a person’s auditory-vocal integration is impaired have not been developed to a point where application within a clinical setting can be applied. Similarly, the best approaches to modify and improve a person’s auditory-vocal integration capabilities remain to be developed.