Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these acoustic differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that take place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load; (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.
Durations of the vocalic portions of speech are influenced by a large number of linguistic and nonlinguistic factors (e.g., stress and speaking rate). However, each factor affecting vowel duration may influence articulation in a unique manner. The present study examined the effects of stress and final-consonant voicing on the detailed structure of articulatory and acoustic patterns in consonant-vowel-consonant (CVC) utterances. Jaw movement trajectories and F 1 trajectories were examined for a corpus of utterances differing in stress and final-consonant voicing. Jaw lowering and raising gestures were more rapid, longer in duration, and spatially more extensive for stressed versus unstressed utterances. At the acoustic level, stressed utterances showed more rapid initial F 1 transitions and more extreme F 1 steady-state frequencies than unstressed utterances. In contrast to the results obtained in the analysis of stress, decreases in vowel duration due to devoicing did not result in a reduction in the velocity or spatial extent of the articulatory gestures. Similarly, at the acoustic level, the reductions in formant transition slopes and steady-state frequencies demonstrated by the shorter, unstressed utterances did not occur for the shorter, voiceless utterances. The results demonstrate that stress-related and voicing-related changes in vowel duration are accomplished by separate and distinct changes in speech production with observable consequences at both the articulatory and acoustic levels.
The present investigation examined the effects of cognitive workload on speech production. Workload was manipulated by having talkers perform a compensatory visual tracking task while speaking test sentences of the form "Say hVd again." Acoustic measurements were made to compare utterances produced under workload with the same utterances produced in a control condition. In the workload condition, some talkers produced utterances with increased amplitude and amplitude variability, decreased spectral tilt and F0 variability and increased speaking rate. No changes in F1, F2, or F3 were observed across conditions for any of the talkers. These findings indicate both laryngeal and sublaryngeal adjustments in articulation, as well as modifications in the absolute timing of articulatory gestures. The results of a perceptual identification experiment paralleled the acoustic measurements. Small but significant advantages in intelligibility were observed for utterances produced under workload for talkers who showed robust changes in speech production. Changes in amplitude and amplitude variability for utterances produced under workload appeared to be the major factor controlling intelligibility. The results of the present investigation support the assumptions of Lindblom's ["Explaining phonetic variation: A sketch of the H&H theory," in Speech Production and Speech Modeling (Klewer Academic, The Netherlands, 1990)] H&H model: Talkers adapt their speech to suit the demands of the environment and these modifications are designed to maximize intelligibility.
Previous research has shown that F1 offset frequencies are generally lower for vowels preceding voiced consonants than for vowels preceding voiceless consonants. Furthermore, it has been shown that listeners use these differences in offset frequency in making judgments about final-consonant voicing. A recent production study [W. Summers, J. Acoust. Soc. Am. 82, 847-863 (1987)] reported that F1 frequency differences due to postvocalic voicing are not limited to the final transition or offset region of the preceding vowel. Vowels preceding voiced consonants showed lower F1 onset frequencies and lower F1 steady-state frequencies than vowels preceding voiceless consonants. The present study examined whether F1 frequency differences in the initial transition and steady-state regions of preceding vowels affect final-consonant voicing judgments in perception. The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.
This study examined speech intelligibility and preferences for omnidirectional and directional microphone hearing aid processing across a range of signal-to-noise ratios (SNRs). A primary motivation for the study was to determine whether SNR might be used to represent distance between talker and listener in automatic directionality algorithms based on scene analysis. Participants were current hearing aid users who either had experience with omnidirectional microphone hearing aids only or with manually switchable omnidirectional/directional hearing aids. Using IEEE/Harvard sentences from a front loudspeaker and speech-shaped noise from three loudspeakers located behind and to the sides of the listener, the directional advantage (DA) was obtained at 11 SNRs ranging from -15 dB to +15 dB in 3 dB steps. Preferences for the two microphone modes at each of the 11 SNRs were also obtained using concatenated IEEE sentences presented in the speech-shaped noise. Results revealed that a DA was observed across a broad range of SNRs, although directional processing provided the greatest benefit within a narrower range of SNRs. Mean data suggested that microphone preferences were determined largely by the DA, such that the greater the benefit to speech intelligibility provided by the directional microphones, the more likely the listeners were to prefer that processing mode. However, inspection of the individual data revealed that highly predictive relationships did not exist for most individual participants. Few preferences for omnidirectional processing were observed. Overall, the results did not support the use of SNR to estimate the effects of distance between talker and listener in automatic directionality algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.