The spectral envelope is a major determinant of the perceptual identity of many classes of sound including speech. When sounds are transmitted from the source to the listener, the spectral envelope is invariably and diversely distorted, by factors such as room reverberation. Perceptual compensation for spectral-envelope distortion was investigated here. Carrier sounds were distorted by spectral envelope difference filters whose frequency response is the spectral envelope of one vowel minus the spectral envelope of another. The filter /I/ minus /e/ and its inverse were used. Subjects identified a test sound that followed the carrier. The test sound was drawn from an /Itch/ to /etch/ continuum. Perceptual compensation produces a phoneme boundary difference between /I/ minus /e/ and its inverse. Carriers were the phrase "the next word is" spoken by the same (male) speaker as the test sounds, signal-correlated noise derived from this phrase, the same phrase spoken by a female speaker, male and female versions played backwards, and a repeated end-point vowel. The carrier and test were presented to the same ear, to different ears, and from different apparent directions (by varying interaural time delay). The results show that compensation is unlike peripheral phenomena, such as adaptation, and unlike phonetic perceptual phenomena. The evidence favors a central, auditory mechanism.
Listeners were asked to identify modified recordings of the words "sir" and "stir," which were spoken by an adult male British-English speaker. Steps along a continuum between the words were obtained by a pointwise interpolation of their temporal-envelopes. These test words were embedded in a longer "context" utterance, and played with different amounts of reverberation. Increasing only the test-word's reverberation shifts the listener's category boundary so that more "sir"-identifications are made. This effect reduces when the context's reverberation is also increased, indicating perceptual compensation that is informed by the context. Experiment 1 finds that compensation is more prominent in rapid speech, that it varies between rooms, that it is more prominent when the test-word's reverberation is high, and that it increases with the context's reverberation. Further experiments show that compensation persists when the room is switched between the context and the test word, when presentation is monaural, and when the context is reversed. However, compensation reduces when the context's reverberation pattern is reversed, as well as when noise-versions of the context are used. "Tails" that reverberation introduces at the ends of sounds and at spectral transitions may inform the compensation mechanism about the amount of reflected sound in the signal.
This study asks whether perceptual mechanisms that compensate for the spectral-envelope distortion of transmission channels also contribute to compensation for speaker differences. Subjects identified test words that were played after a carrier sentence. In some conditions the carriers were synthesized with F1 in low- and high-frequency ranges and in others they were distorted by filters whose frequency response is the spectral envelope of one vowel minus the spectral envelope of another. The filter /I/ minus /epsilon/ and its inverse were used. Test words were drawn from an /Itch/ to /epsilon tch/ continuum. Carriers filtered by /I/ minus /epsilon/ and its inverse give a phoneme boundary difference, indicating compensation for spectral envelope distortion. A phoneme boundary difference also occurs between carriers with F1 in low and high ranges, indicating compensation for speaker differences. Neither of these effects is reduced by playing the carrier backwards, even though a measurement of the perceived naturalness of carriers is sharply reduced by this manipulation. Analysis of carriers synthesized with low and high F1 showed that they have different long-term spectra, and subsequent experiments used time-stationary filters to alter this characteristic. The results showed that the long-term spectra of the carriers govern their influence on the identity of subsequent test sounds. However, measurements of perceptual confusions among the carriers and of perceived talker-differences between carriers revealed that other, time-varying factors are more important for voice identification.
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Features in a sound's spectral envelope are important for perceptual identification but they are likely to be accompanied by spurious features due to distortion by the transmission channel between source and listener. Previous experiments have demonstrated that there is perceptual compensation for this distortion, and the present experiments ask whether the compensation involves a separation of spurious and salient features. Listeners identified words containing a vowel test sound in an /aept/ to /ppt/ continuum, with a carrier phrase before each word. Effects of transmission channels were simulated by filtering the carrier and the /pt/ following the test sound. Filters were pairs with frequency responses that were the difference of the spectral envelopes from the end-point vowels. Contrasts were altered by multiplying decibel values of the carrier filter's frequency response or the test sound's spectral envelope by a positive number. This keeps features such as peaks at the same frequencies but changes the difference in level between peaks and valleys. When the contrasts of the carrier filters and test sound were the same, the continuum's phoneme boundary was shifted in a manner consistent with a perceptual compensation for the filters that affects the neighboring test sound. However, this shift decreased when the carrier-filter's contrast was less than that of the test sound, and increased slightly when the test-sound's contrast was less than the carrier-filter's contrast. Therefore, the amount of compensation increases with the amount of distortion, even when spectral features such as peaks are kept at the same frequencies. So compensation seems to occur before any perceptual extraction of these features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.