Vowel perception is influenced by precursor sounds that are resynthesized to shift frequency regions [Ladefoged and Broadbent (1957). J. Acoust. Soc. Am. 29(1), 98-104] or filtered to emphasize narrow [Kiefte and Kluender (2008). J. Acoust. Soc. Am. 123(1), 366-376] or broad frequency regions [Watkins (1991). J. Acoust. Soc. Am. 90(6), 2942-2955]. Spectral differences between filtered precursors and vowel targets are perceptually enhanced, producing spectral contrast effects (e.g., emphasizing spectral properties of /ɪ/ in the precursor elicited more /ɛ/ responses to an /ɪ/-/ɛ/ vowel continuum, and vice versa). Historically, precursors have been processed by high-gain filters, resulting in prominent stable long-term spectral properties. Perceptual sensitivity to subtler but equally reliable spectral properties is unknown. Here, precursor sentences were processed by filters of variable bandwidths and different gains, then followed by vowel sounds varying from /ɪ/-/ɛ/. Contrast effects were widely observed, including when filters had only 100-Hz bandwidth or +5 dB gain. Average filter power was a good predictor of the magnitudes of contrast effects, revealing a close linear correspondence between the prominence of a reliable spectral property and the size of shifts in perceptual responses. High sensitivity to subtle spectral regularities suggests contrast effects are not limited to high-power filters, and thus may be more pervasive in speech perception than previously thought.
Natural sounds are complex, typically changing along multiple acoustic dimensions that covary in accord with physical laws governing sound-producing sources. We report that, after passive exposure to novel complex sounds, highly correlated features initially collapse onto a single perceptual dimension, capturing covariance at the expense of unitary stimulus dimensions. Discriminability of sounds respecting the correlation is maintained, but is temporarily lost for sounds orthogonal or oblique to experienced covariation. Following extended experience, perception of variance not captured by the correlation is restored, but weighted only in proportion to total experienced covariance. A Hebbian neural network model captures some aspects of listener performance; an anti-Hebbian model captures none; but, a principal components analysis model captures the full pattern of results. Predictions from the principal components analysis model also match evolving listener performance in two discrimination tasks absent passive listening. These demonstrations of adaptation to correlated attributes provide direct behavioral evidence for efficient coding.auditory perception | cortical models | perceptual organization
Speech sounds are traditionally divided into consonants and vowels. When only vowels or only consonants are replaced by noise, listeners are more accurate understanding sentences in which consonants are replaced but vowels remain. From such data, vowels have been suggested to be more important for understanding sentences; however, such conclusions are mitigated by the fact that replaced consonant segments were roughly one-third shorter than vowels. We report two experiments that demonstrate listener performance to be better predicted by simple psychoacoustic measures of cochlea-scaled spectral change across time. First, listeners identified sentences in which portions of consonants (C), vowels (V), CV transitions, or VC transitions were replaced by noise. Relative intelligibility was not well accounted for on the basis of Cs, Vs, or their transitions. In a second experiment, distinctions between Cs and Vs were abandoned. Instead, portions of sentences were replaced on the basis of cochlea-scaled spectral entropy (CSE). Sentence segments having relatively high, medium, or low entropy were replaced with noise. Intelligibility decreased linearly as the amount of replaced CSE increased. Duration of signal replaced and proportion of consonants/vowels replaced fail to account for listener data. CSE corresponds closely with the linguistic construct of sonority (or vowel-likeness) that is useful for describing phonological systematicity, especially syllable composition. Results challenge traditional distinctions between consonants and vowels. Speech intelligibility is better predicted by nonlinguistic sensory measures of uncertainty (potential information) than by orthodox physical acoustic measures or linguistic constructs.
Brief experience with reliable spectral characteristics of a listening context can markedly alter perception of subsequent speech sounds, and parallels have been drawn between auditory compensation for listening context and visual color constancy. In order to better evaluate such an analogy, the generality of acoustic context effects for sounds with spectral-temporal compositions distinct from speech was investigated. Listeners identified nonspeech sounds-extensively edited samples produced by a French horn and a tenor saxophone-following either resynthesized speech or a short passage of music. Preceding contexts were "colored" by spectral envelope difference filters, which were created to emphasize differences between French horn and saxophone spectra. Listeners were more likely to report hearing a saxophone when the stimulus followed a context filtered to emphasize spectral characteristics of the French horn, and vice versa. Despite clear changes in apparent acoustic source, the auditory system calibrated to relatively predictable spectral characteristics of filtered context, differentially affecting perception of subsequent target nonspeech sounds. This calibration to listening context and relative indifference to acoustic sources operates much like visual color constancy, for which reliable properties of the spectrum of illumination are factored out of perception of color.
Spectral contrast effects, the perceptual magnification of spectral differences between sounds, have been widely shown to influence speech categorization. However, whether talker information alters spectral contrast effects was recently debated [Laing, Liu, Lotto, and Holt, Front. Psychol. 3, 1-9 (2012)]. Here, contributions of reliable spectral properties, between-talker and within-talker variability to spectral contrast effects in vowel categorization were investigated. Listeners heard sentences in three conditions (One Talker/One Sentence, One Talker/200 Sentences, 200 Talkers/200 Sentences) followed by a target vowel (varying from /ɪ/-/ɛ/ in F1, spoken by a single talker). Low-F1 or high-F1 frequency regions in the sentences were amplified to encourage /ɛ/ or /ɪ/ responses, respectively. When sentences contained large reliable spectral peaks (+20 dB; experiment 1), all contrast effect magnitudes were comparable. Talker information did not alter contrast effects following large spectral peaks, which were likely attributed to an external source (e.g., communication channel) rather than talkers. When sentences contained modest reliable spectral peaks (+5 dB; experiment 2), contrast effects were smaller following 200 Talkers/200 Sentences compared to single-talker conditions. Constant recalibration to new talkers reduced listeners' sensitivity to modest spectral peaks, diminishing contrast effects. Results bridge conflicting reports of whether talker information influences spectral contrast effects in speech categorization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.