John Plass scite author profile

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time–frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

show abstract

Lip Reading Without Awareness

Plass¹,

Guzman-Martinez²,

Ortega³

et al. 2014

Psychol Sci

View full text Add to dashboard Cite

Vision Perceptually Restores Auditory Spectral Dynamics in Speech

Plass¹,

Brang²,

Suzuki³

et al. 2019

Preprint

View full text Add to dashboard Cite

Visual speech facilitates auditory speech perception, but the visual cues responsible for these effects and the crossmodal information they provide remain unclear. Because visible articulators shape the spectral content of auditory speech, we hypothesized that listeners may be able to extract spectrotemporal information from visual speech to facilitate auditory speech perception. To uncover statistical regularities that could subserve such facilitations, we compared the resonant frequency of the oral cavity to the shape of the oral aperture during speech. We found that the time-frequency dynamics of oral resonances could be recovered with unexpectedly high precision from the shape of the mouth during speech. Because both auditory frequency modulations and visual shape properties are neurally encoded as mid-level perceptual features, we hypothesized that this feature-level correspondence would allow for spectrotemporal information to be recovered from visual speech without reference to higher order (e.g., phonemic) speech representations. Isolating these features from other speech cues, we found that speech-based shape deformations improved sensitivity for corresponding frequency modulations, suggesting that the perceptual system exploits crossmodal correlations in mid-level feature representations to enhance speech perception. To test whether this correspondence could be used to improve comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by crossmodal recovery of auditory speech spectra. Visual speech may therefore facilitate perception by crossmodally restoring degraded spectrotemporal signals in speech.

show abstract

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

Ganesan

Plass

Beltz

et al. 2021

Eur J of Neuroscience

View full text Add to dashboard Cite

Speech perception is a central component of social communication. While principally an auditory process, accurate speech perception in everyday settings is supported by meaningful information extracted from visual cues (e.g., speech content, timing, and speaker identity). Previous research Auditory speech signals are conveyed rapidly during natural speech (3-7 syllables per second; Chandrasekaran et al., 2009), making the identification of individual speech sounds a computationally challenging task (Elliott and Theunissen, 2009). Easing the complexity of this process, audiovisual signals during face-to-face communication help predict and constrain perceptual inferences about speech sounds in both a bottom-up and top-down manner (Bernstein

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Plass

Vision perceptually restores auditory spectral dynamics in speech

Lip Reading Without Awareness

Vision Perceptually Restores Auditory Spectral Dynamics in Speech

Visual speech differentially modulates beta, theta, and high gamma bands in auditory cortex

Contact Info

Product

Resources

About