This paper investigates whether compensation for coarticulation in speech perception can be mediated by native language. Substantial work has studied compensation as a consequence of aspects of general auditory processing or as a consequence of a perceptual gestural recovery processes. The role of linguistic experience in compensation for coarticulation potentially cross-cuts this controversy and may shed light on the phonetic basis of compensation. In Experiment 1, French and English native listeners identified an initial sound from a set of fricative-vowel syllables on a contiuum from [s] to [S] with the vowels [a,u,y]. French speakers are familiar with the round vowel [y], while it is unfamiliar to English speakers. Both groups showed compensation (a shifted 's'/'sh' boundary compared with [a]) for the vowel [u], but only the French-speaking listeners reliably compensated for the vowel [y]. In Experiment 2, twenty-four American English listeners judged videos in which the audio stimuli of Experiment 1 were used as soundtracks of a face saying [s]V, [S]V, or a visual-blend of the two fricatives. The study found that videos with [S] visual information induced significantly more "S" responses than did those made from visual [s] tokens. However, as in Experiment 1, English-speaking listeners reliably compensated for [u], but not for the unfamiliar vowel [y]. The listeners used visual consonant information for categorization, but did not use visual vowel information for compensation for coarticulation. The results indicate that perceptual compensation for coarticulation is a language specific effect tied to the listener's experience with the conditioning phonetic environment.
This paper investigates whether compensation for coarticulation in speech perception can be mediated by native language. Substantial work has studied compensation as a consequence of aspects of general auditory processing or as a consequence of a perceptual gestural recovery processes. The role of linguistic experience in compensation for coarticulation potentially cross-cuts this controversy and may shed light on the phonetic basis of compensation. In Experiment 1, French and English native listeners identified an initial sound from a set of fricative-vowel syllables on a contiuum from [s] to [S] with the vowels [a,u,y]. French speakers are familiar with the round vowel [y], while it is unfamiliar to English speakers. Both groups showed compensation (a shifted 's'/'sh' boundary compared with [a]) for the vowel [u], but only the French-speaking listeners reliably compensated for the vowel [y]. In Experiment 2, twenty-four American English listeners judged videos in which the audio stimuli of Experiment 1 were used as soundtracks of a face saying [s]V, [S]V, or a visual-blend of the two fricatives. The study found that videos with [S] visual information induced significantly more "S" responses than did those made from visual [s] tokens. However, as in Experiment 1, English-speaking listeners reliably compensated for [u], but not for the unfamiliar vowel [y]. The listeners used visual consonant information for categorization, but did not use visual vowel information for compensation for coarticulation. The results indicate that perceptual compensation for coarticulation is a language specific effect tied to the listener's experience with the conditioning phonetic environment.
In this presentation, I demonstrate that certain nonspeech sounds can have perceptual phonetic value. I focus on a single phonetic/articulatory feature, lip rounding, and its detection in simple auditory stimuli. Behavioral experiments show that a rounding percept is possible for two types of nonspeech. One stimulus type, which yields the more robust response, is a complex periodic source filtered by a single narrow band reminiscent of a speech formant. The resulting nonspeech varies in perceived roundedness depending on the filter’s frequency, corresponding roughly with F2. The other stimulus type is a pure tone modulated upward in frequency. Preliminary results suggest that rounding can indeed be perceived on these sounds, but only with specific modulation rates within a certain frequency range. These findings indicate that minimally simple auditory objects, including pure tones and filtered bands, can be sufficient to encode phonetic information. Additionally, these two types of cues diverge in their ability to trigger this percept: a filtered band works as a static spectral cue, whereas a pure tone requires spectrotemporal modulation. This observation is consistent with findings that there are auditory STRFs specifically sensitive to modulation and the theoretical perspective that auditory organization directly predicts the processing of speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.