Introduction
Parkinson’s disease (PD) is a chronic neurodegenerative disorder caused by the progressive loss of dopaminergic neurons primarily in the substantia nigra pars compacta, but also in other areas of the brain [1,2]. From a clinical perspective, PD is manifested by cardinal motor impairments of tremor, rigidity, akinesia or bradykinesia, and postural instability, as well as non-motor impairments of mood, cognition, and autonomic and sensory systems [3]. Furthermore, approximately 90% of patients with PD develop hypokinetic dysarthria during the course of their illness [4-6]. Hypokinetic dysarthria is characterized by vocal (e.g., hypophonia, harsh/breathy voice), articulatory (e.g., imprecise consonants), and prosodic (e.g., monopitch, reduced stress, and variations in speech rate) impairments. Among these, vocal impairments have been cited as one of the earliest, most frequent, and salient symptoms in people with PD [6-12]. In their seminal investigation, Logemann and colleagues revealed that 89% of the total 200 patients with PD experienced voice impairments followed by 45% with articulatory impairments through perceptual analyses of speech. Rates of co-occurrence for these impairments also showed a predominance with over 45% of the patients reporting voice impairments solely.
Researchers and clinicians often evaluate the degree of these vocal impairments using sustained vowel phonations and connected speech tasks [e.g., 9,13-23]. Prior research has extensively relied on sustained vowel phonations for three primary reasons: (a) The production of sustained vowels is a fairly easy and repeatable speech task, (b) acoustic measures of voicing (e.g., jitter, shimmer, cepstral peak prominence) are readily available from several commercially available software (e.g., KayPENTAX, Montvale, NJ), and (c) analysis of connected speech is more complex due to confounding cognitive, articulatory, and prosodic variables. In addition to the conventional tasks, vocal impairments have also been examined using diadochokinetic (DDK) tasks that require patients to produce /Ɂ/ and/or /hʌ/ at a rapid pace [24,25]. These studies in older (>60 years) and younger (<50 years) disease onset PD patients have revealed lower repetition rates and regularity compared to age-matched controls. Such maximum performance tasks assess the patient’s ability to alternate vocal fold abduction and adduction and have also been used as a measure of speech deterioration in other neuromotor diseases such as amyotrophic lateral sclerosis [26] and more recently to examine the effects of aging (across the adult lifespan) in healthy normals [27].
Alternatively, laryngeal coordination can be assessed through a vocal pitch control task that assesses the patient’s ability to modify vocal fold tension by alternating between high and low pitch. However, only a few studies have examined such a task, and these have not been in disordered populations [28-30]. In their seminal investigation, Sundberg [29] measured the maximum speed of pitch change. However, this study was conducted on trained professional voice users. Five male and four female singers were asked to alternate between two given pitches as rhythmically and as quickly as possible. Results were compared to five male and six female untrained participants who were given voice training for one week. Instructions were augmented with distinct hand movements or by clicks in earphone combined with flashes of lamp. Fundamental frequency (f0) was measured using a low pass filter connected to a zero-crossing detector. Response time was measured as the time needed by the subject to produce 6/8th of the pitch change. In singers, the response time was shorter, suggesting that they were able to perform pitch elevations more quickly due to their musical training experience. Additionally, rising time was longer than the falling time in untrained subjects whereas this difference was less pronounced in singers. These results were supported by physiological explanations of muscle memory being greater in singers and the properties of the cricothyroid muscle (development/training).
A majority of the DDK studies in people with PD have investigated the limb-motor (e.g., finger tapping; [31,32]) and oral-motor function (e.g., /pʌ/, /pʌtʌkʌ/; [33-36]). However, there is a paucity of literature in assessing the functional integrity and efficiency of the laryngeal system through dynamic vocal range tasks, as evidenced through the limited literature cited above. Although tasks that target the laryngeal DDK are far less common, they have significant potential to reveal neuromuscular mechanisms (e.g., control and coordination) and impairments (e.g., rigidity) affecting the laryngeal system. Therefore, the purpose of this study was to explore laryngeal coordination in people with PD through a rapid/speeded pitch DDK task based on (a) the assumption that dysarthria in PD largely presents with vocal impairments, (b) the need for maximum performance task to assess the laryngeal tension control, and (c) that dynamic measures of the laryngeal system will provide more insight into the temporal control of movements rather than obtaining measures from a ‘steady state’ vowel production.
Methods
This study was approved by the institutional review board at Michigan State University (IRB Nos. 12-967 and 13-443). Written informed consent was provided by all the participants before speech and voice recording.
Participants
Sixteen native speakers of American English, including 8 people (7 M and 1 F) with a clinical diagnosis of ‘idiopathic’ Parkinson disease (PD) and 8 controls (7 M and 1 F) were recruited for the current study. The mean±standard deviation ages of the PD and control participants were 62±10 (range = 50 to 74) and 56±5 (range = 52 to 64) years, respectively. Participants were recruited from different support groups around Lansing community and the Neuro-Ophthalmology clinic at Michigan State University. The mean disease duration since the time of diagnosis was 7±6 (range = 1 to 19) years. All participants passed the hearing screening (air-conduction pure-tone thresholds below 40 dB HL at 500 Hz, 1 kHz, 2 kHz, and 4 kHz in at least one ear). Further, they also passed the cognitive and depression screenings evaluated by the Mini-Mental State Examination [37] and Beck Depression Inventory [38], respectively. The cognitive screening ensured that all participants could follow the task instructions. Given the known relationship between severity of depression and measures of pitch (e.g., limited pitch range in people who are depressed), a depression screening was completed. MMSE and BDI scores were not available for the first three participants. Therefore, self-reports were obtained for cognition and depression for these three participants. For the remainder of participants with PD, the mean MMSE score was 28.6±1.92 (range = 25.5 to 30) and mean BDI score was 7.6±3.29 (range = 4 to 12). To ensure that participants did not suffer from any other co-morbid voice impairments, self-reports of medical history were obtained. None of the participants had any acute upper respiratory tract infections at the time of voice recording. Participants were excluded if they had prior musical training. None of the participants had prior speech/voice therapy. Most of the participants were under levodopa-carbidopa medications and recordings were obtained during their “ON” state (mean time since last medication was 3 hours). Table 1 depicts demographic characteristics of all participants.
Subject Number | Group | Age (years) | Sex | Years since diagnosis | MMSE | BDI | Time since last medication (hours) |
---|---|---|---|---|---|---|---|
(≥25) | (≤13) | ||||||
S01 | PD | 53 | M | 7 | NA | NA | 3.5 |
S02 | PD | 50 | M | 6 | NA | NA | 3.5 |
S03 | PD | 54 | F | 3 | NA | NA | 3 |
S04 | PD | 66 | M | 3 | 28 | 6 | 2 |
S05 | PD | 58 | M | 19 | 25.5 | 10 | 0.5 |
S06 | PD | 71 | M | 1 | 29.5 | 4 | 4 |
S07 | PD | 74 | M | 7 | 30 | 12 | 3 |
S08 | PD | 72 | M | 11 | 30 | 6 | 1 |
S01 | HC | 52 | M | ||||
S02 | HC | 51 | M | ||||
S03 | HC | 52 | F | ||||
S04 | HC | 64 | M | ||||
S05 | HC | 59 | M | ||||
S06 | HC | 56 | M | ||||
S07 | HC | 60 | M | ||||
S08 | HC | 57 | M |
Note. MMSE: Mini-Mental State Examination; BDI - Beck’s Depression Inventory.
Experimental Procedure
Voice samples were recorded using a Shure Beta53 high-quality omnidirectional head-worn condenser microphone, digitized directly to a TASCAM DR-40 recorder with a 44.1 kHz sampling frequency and a 16-bit quantization rate. The microphone-to-mouth distance was approximately 5 cm at an angle of about 45 degrees. All recordings were made in a sound booth. Participants were instructed by the author (S.A) to rapidly transition or alternate between a chosen comfortable low and high pitch (on the vowel /a/) and were instructed to complete the task as a pitch glide. Since the goal was not to analyze the absolute pitch matching accuracy, participants could choose any pitch that elicited a healthy phonation without voice breaks/fatigue. Participants were provided with step-by-step instructions, demonstrations, and visual feedback via PRAAT software [39] to ensure that the participants chose an adequate pitch glide range and performed the task as rapidly as they could for approximately 5-10 seconds. Step 1 of the instruction was to choose the lowest comfortable pitch and perform it in 2 trials. Step 2 was to choose the highest comfortable pitch and perform it in 2 trials. Step 3 was to perform a pitch glide between the selected pitches at a habitual rate for four to five cycles. Feedback was provided and participants were asked to repeat Step 3 for one more trial. Step 4 was to perform the pitch glide between the selected pitches at a fast rate for four to five cycles. Feedback was provided and participants were asked to repeat Step 4 for one more trial. No additional practice trials were completed by any of the participants. Feedback was in the form of demonstration by the author as well as the visual feedback from PRAAT software. The goal was to ensure smooth pitch glides without any breaks and with adequate voice quality and range. At least four cycles were secured for each experimental trial and a total of three experimental trials were elicited from each participant. During the actual experimental task, there was no visual feedback.
Computational Measures
A custom-designed algorithm was developed in MATLAB (version 2015b; MathWorks, Natick, MA) to measure quantitative metrics of pitch range and slope (range/time). First, pitch contour was extracted using Auditory Sawtooth Waveform Inspired Pitch Estimator-Prime algorithm (Aud-SWIPE'; [40,41]). Aud-SWIPE' filters the audio signal in a similar way to the outer and middle ear by flattening the spectral envelope and using a perceptually motivated filterbank. The output from each channel of the filterbank is half-wave rectified in a similar way to inner hair cell rectification. Then, rectified channel signals are converted to spectral magnitude, square root compressed, and summed across the channels which approximates a specific loudness function. The Fast Fourier Transform (FFT) analysis frame size is about eight fundamental periods of each pitch candidate value with 50% overlap between adjacent frames. The pitch candidates are spaced between 80 and 400 Hz with 48 pitch candidates per octave. The specific loudness function of the pitch candidates is correlated with the specific loudness function of a sawtooth waveform for all pitch candidates, and the one with the highest correlation (normalized between 0 and 1) is determined to be the pitch. Since the Aud-SWIPE' algorithm uses a frequency scale that is biologically inspired (equivalent rectangular bandwidth, ERB) unlike FFT based algorithms that use linearly spaced bins, it has been shown to be more robust than other algorithms in estimating pitch [42,43]. Furthermore, prior research has demonstrated that Aud-SWIPE' provides robust estimates of pitch even in severely dysphonic voices compared to conventional fundamental frequency (f0) tracking algorithms due to its non-dependence on signal periodicity [44]. Aud-SWIPE' algorithm exported the instantaneous pitch values to an Excel sheet. Second, automated MATLAB plots allowed visual inspection of the pitch contour and hand-correction of any halving/doubling errors. The Aud-SWIPE' algorithm was re-run with the corrected pitch values for more accurate instantaneous values. Pitch range was computed as the difference between the two self-chosen pitches and pitch slope was calculated as the ratio of pitch glide range to the time durations. Rise cycle is indicated as “positive” and Fall cycle is indicated as “negative” for pitch range and pitch slope measures.
Results
Descriptive statistics for pitch range and slope measures are depicted in Figures 1 and 2, respectively. Pitch range was marginally lower in people in PD compared to controls for both rise and fall cycles. There was no difference between the rise/positive and fall/negative cycles for both speaker groups. Similar to the pitch range, pitch slope was marginally lower in people with PD for both the cycles. In contrast to the pitch range, pitch slopes were shorter in rise/positive cycle compared to fall/negative cycles in both the speaker groups. Due to the small sample size, medians and non-parametric tests were used for statistical analysis. A Mann-Whitney U test was used to examine for differences in pitch range and slope measures between the speaker groups. Results revealed that there were no significant differences between the two groups for both the pitch measures (p>0.05). Effect size calculated as
Given the heterogeneity of symptoms in people with PD, individual participant data was also examined. To do it, each participant with PD was assessed with a healthy control participant. Accordingly, Figures 3, 4, 5, 6, 7, 8, 9 and 10 depict potential individual differences across talkers with PD. For the first 5 participant comparisons, age-matching was ±1 or ±2 years. Each figure represents the raw data with the 3 experimental trials (trial 1- blue, trial 2 - green, and trial 3 - red). A visual analysis of the figures revealed marked lower pitch slopes in PD for half of the participants, namely S02 (Figure 4), S05 (Figure 7), S07 (Figure 9), and S08 (Figure 10). However, it is important to note that for S07 and S08, the age difference between PD and HC was greater than 10 years. PD S03 and S04 demonstrate festination/hastening patterns (more number of cycles within the 4 to 5 sec time duration) similar to those found in motor and oral DDK movements and speech. There seems to be no difference in the number of pitch DDK cycles between PD and HC. Moreover, the data does not reveal any major fatigue effects for both populations across the three trials evidenced through smaller slope values across the three trials/colored lines in the figures.
The absolute range and slope values for rise/positive and fall/negative cycles were averaged to investigate the potential effects of age and PD disease duration. As age increased, pitch range decreased in both people with PD (r = -0.58) and controls (r = -0.51). Similarly, pitch slope also decreased in people with PD (r = -0.73; p = 0.04) and controls (r = -0.32) with an increase in age. As disease duration increased in people with PD, both pitch range and slope measures decreased (r = -0.52 & -0.45, respectively).
Discussion
The use of maximum performance tasks such as diadochokinetic (DDK) tasks, has been a common standard in the clinical assessment of motor speech disorders [26,36]. There is a significant preference for using a higher-faster-further method to motor speech testing since these tasks frequently push the limits of motor, respiratory, vocal tract, or laryngeal performance [45]. A component of normal voice production is the modification of pitch. The current study examined a novel laryngeal maximum performance task in the form of a speeded pitch DDK task, to explore the ability of older adults with and without PD, to modify their vibrating vocal fold mass, flexibility of the vertical laryngeal position, and vocal fold tension rather than vocal fold abduct/adduct mechanism. It is probable that reductions in the maximal performance in PD patients are caused, at least in part, by physiological deficiencies. The main results are (a) pitch range and slope are slightly reduced in people with PD compared to controls, (b) there is a significant age effect on pitch slope in people with PD, and (c) performance variability is observed across individual participants.
In addition to serving as a useful marker of disease progression, maximum performance tasks might identify speakers/talkers who have a “reduced reserve” (i.e., whose speech impairment manifests only in certain situations/contexts that tax or increase the complexity of the vocal production mechanism) [45]. In this study, this effect was observed in two of the PD patients. The significant age effect only in PD patients combined with the effects of their disease duration may demonstrate the merits of the pitch slope measure and the concept of evaluating individual variability and measurement over time, as a marker of disease progression. A future study with a larger age-matched sample size and balanced sex ratios is needed to confirm these results. Similar to the conventional oral and laryngeal DDK, a pitch DDK task is easy to administer within a few minutes for healthy older adults and all disordered populations. It also permits analysis of well-controlled repetitions and computational analysis can be fully automated (intrinsically more reliable and reproducible), as demonstrated in this study.
Albeit the statistical significance, the nature of this task itself, (i.e., pitch glide) is a common component of assessment and treatment of several other voice disorders. Pitch range can be affected in people with hyperfunctional voice disorders, such as muscle tension dysphonia, or benign lesions, as vocal nodules, as well as people with neurological disorders. Indeed, atypical laryngeal DDK results for rate and stability have been reported in people with vocal tremor [46] and spasmodic dysphonia [46,47]. Pitch glides are also a part of comprehensive physiological voice therapy programs such as vocal function exercises [48], Lee Silverman Voice Treatment [49], as well as SPEAK OUT!® [50] that target improvement of pitch range and prosody during conversational speech. A speeded maximum performance version, while more beneficial for assessment, can also be used in treatment to increase complexity over treatment sessions. Producing a vocal siren as in the pitch DDK task requires regulation of multiple parameters, simultaneously including maintenance of constant volume and voice quality and prior research [51] has shown that targeted voice training can decrease PD symptoms.
In addition to the larger sample size, future studies should (a) focus on instructions and training (i.e.., effects of practice and number of trials for optimum performance), (b) develop a large normative, (c) investigate validity and reliability of the pitch DDK task, (d) analyze of speed vs. range trade-offs, (e) develop quantitative measures beyond the conventional rate, range, and regularity, (f) compare oral, laryngeal, and pitch DDK in motor speech and voice disorders, and (g) examine physiology to provide direct explanations about the speed of pitch change. Overall, the collective use of disorder-specific variables pertaining to pitch, energy, and temporal variability may provide new diagnostic and therapeutic insights and improve the clinical utility of DDK.
Conclusions
The current study explored a novel pitch DDK task. Pitch range and slope were automatically derived from Aud-SWIPE' algorithm. Both range and slope measures were slightly reduced in people with PD compared to controls. Moreover, there was an effect of age and disease duration in people with PD. An analysis with a larger age- and sex-balanced dataset is warranted for a comprehensive evaluation. From a clinical perspective, such a task has potential to be integrated into future diagnostic or therapeutic practice because (a) it can be sensitive to early stages of PD, and (b) it can be reliably derived from an easily performed task with minimal time and equipment requirements.