In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as “speech tracking” in EEG. Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from noisy and naturalistic environments can be generalized to more controlled stimuli. If encoding models for noisy, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations who may not tolerate listening to more controlled, less-engaging stimuli for long periods of time. We recorded non-invasive scalp EEG while participants listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field (mTRF) encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both noise-free and noisy stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled data sets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to noisy speech were more accurate when including both phonological and acoustic features. These findings may inform basic science research on speech-in-noise processing. Ultimately, they may also provide insight into auditory processing in people who are hard of hearing, who use a combination of audio and visual cues to understand speech in the presence of noise.Significance StatementUnderstanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli – sentences from a speech corpus, and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Speaking elicits a suppressed neural response when compared with listening to others' speech, a phenomenon known as speaker-induced suppression (SIS). Previous research has focused on investigating SIS at constrained levels of linguistic representation, such as the individual phoneme and word level. Here, we present scalp EEG data from a dual speech perception and production task where participants read sentences aloud then listened to playback of themselves reading those sentences. Playback was separated into immediate repetition of the previous trial and randomized repetition of a former trial to investigate if forward modeling of responses during passive listening suppresses the neural response. Concurrent EMG was recorded to control for movement artifact during speech production. In line with previous research, ERP analyses at the sentence level demonstrated suppression of early auditory components of the EEG for production compared with perception. To evaluate whether linguistic abstractions (in the form of phonological feature tuning) are suppressed during speech production alongside lower-level acoustic information, we fit linear encoding models that predicted scalp EEG based on phonological features, EMG activity, and task condition. We found that phonological features were encoded similarly between production and perception. However, this similarity was only observed when controlling for movement by using the EMG response as an additional regressor. Our results suggest that SIS operates at a sensory representational level and is dissociated from higher order cognitive and linguistic processing that takes place during speech perception and production. We also detail some important considerations when analyzing EEG during continuous speech production.
Speaking elicits a suppressed neural response when compared to listening to others' speech, a phenomenon known as speaker-induced suppression (SIS). Previous research has focused on investigating SIS at constrained levels of linguistic representation, such as the individual phoneme and word level. Here we present scalp EEG data from a dual speech perception and production task where participants read sentences aloud then listened to playback of themselves reading those sentences. Playback was separated into predictable repetition of the previous trial and unpredictable, randomized repetition of a former trial to investigate the role predictive processing plays in SIS. Concurrent EMG was recorded to control for movement artifact during speech production. In line with previous research, event-related potential analyses at the sentence level demonstrated suppression of early auditory components of the EEG for production compared to perception. To evaluate whether specific neural representations contribute to SIS (in contrast with a global gain change), we fit linear encoding models that predicted scalp EEG based on phonological features, EMG activity, and task condition. We found that phonological features were encoded similarly between production and perception. However, this similarity was only observed when controlling for movement by using the EMG response as an additional regressor. Our results suggest SIS is at the representational level a global gain change between perception and production, not the suppression of specific characteristics of the neural response. We also detail some important considerations when analyzing EEG during continuous speech production.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.