The efficacy of audio-visual interactions in speech perception comes from two kinds of factors. First, at the information level, there is some "complementarity" of audition and vision: It seems that some speech features, mainly concerned with manner of articulation, are best transmitted by the audio channel, while some other features, mostly describing place of articulation, are best transmitted by the video channel. Second, at the information processing level, there is some "synergy" between audition and vision: The audio-visual global identification scores in a number of different tasks involving acoustic noise are generally greater than both the auditory-alone and the visual-alone scores. However, these two properties have been generally demonstrated until now in rather global terms. In the present work, audio-visual interactions at the feature level are studied for French oral vowels which contrast three series, namely front unrounded, front rounded, and back rounded vowels. A set of experiments on the auditory, visual, and audio-visual identification of vowels embedded in various amounts of noise demonstrate that complementarity and synergy in bimodal speech appear to hold for a bundle of individual phonetic features describing place contrasts in oral vowels. At the information level (complementarity), in the audio channel the height feature is the most robust, backness the second most robust one, and rounding the least, while in the video channel rounding is better than height, and backness is almost invisible. At the information processing (synergy) level, transmitted information scores show that all individual features are better transmitted with the ear and the eye together than with each sensor individually.
Les travaux présentés par Worley et Àbry dans ce même congrès précisent les conditions d'anticipation du geste labial dans des séquences /iy/ en français : dans une séquence /i/->/y/, le geste d'arrondissement peut être préparé dès le début de la voyelle initiale, sans incidence acoustique, alors que les non-linéarités articulatori-acoustiques empêchent toute anticipation du /y/ vers le /i/. Nous étudions ici des séquences /zVizV2/, (Vi, v 2 = /i/ ou /y/), et nous montrons que le geste labial est bien perçu et correctement identifié visuellement, les sujets détectant l'anticipation d'arrondissement /i/->/y/ sans aucune ambiguïté. Par contre, les transitions de formants indiquant le geste de la première à la seconde voyelle ne sont pas exploitées auditivement, car elles peuvent être rapportées autant au geste consonantique /z/ qu'au geste vocalique Vi->V2.
The mechanisms of anticipation for the rounding gesture have been repeatedly investigated in previous works (see, e.g., the controversy between Lubker and Gay concerning the extent of anticipation in Swedish versus American English, or the support found in the French language for Henke's “look ahead” model as exemplified by data from Benguerel's famous “sinistre structure”). Concerning the visual perception of such an anticipation, McGurk [in The Cognitive Representation of Speech, edited by T. Myers et al. (North-Holland, Amsterdam, 1981), p. 336] has briefly mentioned an experiment using reaction times in CV identification. He demonstrated that listeners do take visual information about anticipation into account, and identify CV syllables on the basis of lip movement information prior to their being perceived auditorily. In the present experiment, this same result is found for French, with a different experimental protocol, taking into account simultaneous acoustic and articulatory measurements. Here, /zV1zV2/ trajectories (V1, V2, = /i/ or /y/), were used and compared with auditory identification data obtained from gated signals with results of visual identification for front face video images taken every 20 ms along the V1 → V2 trajectory. The following results were found: (1) Lip area transitions clearly show the asymmetry of vowel-to-vowel gestures. The transition from /y/ to /i/ begins at the acoustic: onset of the consonant, while the transition from /i/ to /y/ can begin very early in the /i/; (2) this anticipation of the rounding gesture is clearly perceived visually by the subjects who are able to identify the /y/ vowel before the end of the /i/; and (3) visual detection of the rounding gesture thus comes prior to its auditory detection, which seems, in fact, disturbed by the acoustic mixture of the vocalic gesture (/i/ → /y/ or /y/ → /i/) and the consonantal gesture (/z/). Implications for the timing and perception of the vowel-to-vowel gesture are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.