Highlights Delta/theta band EEG tracks band-limited speech-derived envelopes similar to real speech. Low and high frequency speech-derived envelopes are represented differentially. High-frequency derived envelopes are more susceptible to attentional and contextual manipulations. Delta band tracking shifts towards low frequency derived envelopes with more acoustic detail.
Speech is an intrinsically multisensory signal, and seeing the speaker’s lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker’s lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed ‘speech tracking’) many studies consider the overall speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this overall envelope provides a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and reflects the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the overall speech envelope.HighlightsDelta/theta band EEG tracks band-limited speech envelopes similar to real speechLow and high frequency derived speech envelopes are represented differentiallyHigh-frequency derived envelopes are more susceptible to attentional and contextual manipulationsDelta band tracking shifts towards low frequency derived envelopes with more acoustic detail
Speech is an intrinsically multisensory signal and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension and previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this question, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and speech-related features that were physically absent but may still be critical for comprehension. Our results demonstrate that both cortices are capable of a functionally specific form of multisensory restoration: during lip reading both reflect unheard acoustic features, with occipital regions emphasizing spectral information and temporal regions emphasizing the speech envelope. Importantly, the degree of envelope restoration was predictive of lip reading performance. These findings suggest that when seeing the speaker's lips the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
Hearing is an active process and recent studies show that even the ear is affected by cognitive states or motor actions. One example are movements of the eardrum induced by saccadic eye movements - known as "eye movement-related eardrum oscillations" (EMREOs). While these are systematically shaped by the direction and size of saccades, the consequences of saccadic eye movements and their resulting EMREOs for hearing remain unclear. We here studied their implications for the detection of near-threshold clicks in human participants. Across three experiments sound detection was not affected by their time of presentation relative to saccade onset, by saccade amplitude or direction. While the EMREOs were shaped by the direction and amplitude of the saccadic movement, inducing covert shifts in spatial attention did not affect the EMREO, suggesting that this signature of active sensing is restricted to overt changes in visual focus. Importantly, in our experiments fluctuations in the EMREO amplitude were not related to detection performance, at least when monaural cues are sufficient. Hence while eye movements may shape the transduction of acoustic information the behavioral implications remain unclear.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.