All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.
Voice training exploits semiocclusives, which increase vocal tract interaction with the source. Modeling results suggest that vocal economy (maximum flow declination rate divided by maximum area declination rate, MADR) is improved by matching the glottal and vocal tract impedances. Changes in MADR may be correlated with thyroarytenoid (TA) muscle activity. Here the effects of impedance matching are studied for laryngeal muscle activity and glottal resistance. One female repeated [pa:p:a] before and immediately after (a) phonation into different-sized tubes and (b) voiced bilabial fricative [β:]. To allow estimation of subglottic pressure from the oral pressure, [p] was inserted also in the repetitions of the semiocclusions. Airflow was registered using a flow mask. EMG was registered from TA, cricothyroid (CT) and lateral cricoarytenoid (LCA) muscles. Phonation was simulated using a 7 × 5 × 5 point-mass model of the vocal folds, allowing inputs of simulated laryngeal muscle activation. The variables were TA, CT and LCA activities. Increased vocal tract impedance caused the subject to raise TA activity compared to CT and LCA activities. Computer simulation showed that higher glottal economy and efficiency (oral radiated power divided by aerodynamic power) were obtained with a higher TA/CT ratio when LCA activity was tuned for ideal adduction.
Voiced obstruents and phonation into tubes are widely used as vocal exercises. They increase the inertive reactance of the vocal tract in the 200-1000 Hz range and thereby reinforce vocal fold vibration. But the effect is strong only when the epilarynx tube is also narrowed. The present study focused on the effects of a 'resonance tube' (27 cm in length, 0.5 cm2 cross-sectional area, hard walls) on vocal tract reactance and the accompanying economy of voice production (defined as maximum flow declination rate (MFDR), divided by maximum area declination rate (MADR)). The vowel /u/ and phonation into the tube were simulated with a computer model. Three values were given to the cross-sectional area of the epilarynx tube (0.2 cm2, 0.5 cm2, and 1.6 cm2), which is at the opposite end of the vocal tract from the artificial 'resonance tube'. The degree of glottal adduction was varied in order to find the economy maximum for each epilarynx tube setting. Results showed that the 'resonance tube' lowered F1 from 300 Hz to 150 Hz and doubled the vocal tract inertive reactance at F0=100 Hz. The largest economy with the 'resonance tube' was obtained when the epilarynx tube was narrowed (relative to the rest of the vocal tract) and sufficiently tight adduction was used. Most importantly, the intraoral acoustic pressure (calculated at 0.8 cm behind the lips) was tripled with the tube. The results suggest that by optimizing the vibratory sensations in the face that are attributed to increased intraoral acoustic pressure, phonation into a tube may assist a trainee in finding an optimal glottal and epilaryngeal setting for the greatest vocal economy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.