The perception of syllable-initial stop consonants as voiced (/b/, /d/, /g/) or voiceless (/p/, /t/, /k/) was shown to depend on the prevailing rate of articulation. Reducing the articulatory rate of a precursor phrase causes a greater proportion of test consonants to be identified as voiced. Subsequent experiments demonstrated that this effect depends almost entirely on variation in the duration of the syllable immediately preceding the test syllable; this, the duration of the intervening silent stop closure, and the duration of the test syllable itself all influenced the identification of the stop as voiced or voiceless. Variation in the tempo of a nonspeech melody produced no effect on the perception of embedded test syllables. Those manipulations which produce the major part of the influence of rate do so not by changing the context in which the stop is perceived, but rather by changing temporal concomitants of the constriction, occlusion, and release phases of the articulation of the stop itself. For this reason, an explanation for such effects based on extrinsic timing in perception is found to be wanting. Timing should, in the main, be regarded as intrinsic to the acoustical specifications of phonetic events, a view that is compatible with recent reformulations of the problem of timing control in speech production.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.
The intelligibility of sentences presented in noise improves when the listener can view the talker's face. Our aims were to quantify this benefit, and to relate it to individual differences among subjects in lipreading ability and among sentences in lipreading difficulty. Auditory and audiovisual speech-reception thresholds (SRTs) were measured in 20 listeners with normal hearing. Sixty sentences, selected to range in the difficulty with which they could be lipread (with vision alone) from easy to hard, were presented for identification in white noise. Using the ascending method of limits, the SRT was defined as the lowest signal-to-noise ratio at which all three 'key words' in each sentence could be identified correctly. Measured as the difference in dB between auditory-alone and audiovisual SRTs, 'audiovisual benefit' averaged 11 dB, ranging from 6 to 15 dB among subjects, and from 3 to 22 dB among sentences. As predicted, audiovisual benefit is a measure of lipreading ability. It was highly correlated with visual-alone performance (n = 20, r = 0.86, P less than 0.01). Likewise, those sentences which were easiest to lipread gave a higher measure of benefit from vision in audiovisual conditions than did sentences that were hard to lipread (n = 60, r = 0.92, P less than 0.01). The results establish the basis of an efficient test of speech-reception disability in which measures are freed from the floor and ceiling effects encountered when percentage correct is used as the dependent variable.
The accuracy with which naive listeners can report sentences presented 12 dB below a background of continuous prose was compared with accuracy in four audio visually supplemented conditions. With monochrome displays of the talker showing (i) the face, (ii) the lips and (iii) four points at the centres of the lips and the corners of the mouth, accuracy improved by 43, 31 and 8%, respectively. No improvement was produced by optical information on syllabic timing. The results suggest that optical concomitants of articulation specify linguistic information to normal listeners. This conclusion was reinforced in a second experiment in which identification functions were obtained for continua of synthetic syllables ranging between [aba], [ada] and [aga], presented both in isolation and in combination with video recordings. Audio-visually, [b] was only perceived when lip closure was specified optically and, if lip closure was specified optically, [b] was generally perceived. Perceivers appear to make use of articulatory constraints upon the combined audio-visual specification of phonetic events, suggesting that optical and acoustical displays are co-perceived in a common metric closely related to that of articulatory dynamics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.