When we are following someone when there are multiple speakers, are we just paying attention to the sound the speaker makes, the word the speaker uses, or are we aligning our minds to the attended speaker? In this process, selective attention plays a critical role. Previous studies have mainly focused on attention modulation on the single types of features from the speaker. However, attending to a speaker needs an integrative process: the sound waves, which represent acoustic features, should be heard; the meaning of the text, which represents semantic features, needs to be understood; listeners and speakers need to form a common grounding during the process, which represent the inter-brain feature. We still do not know how our brain entrains different types of features from the speaker in an integrative way. In this study, naturalistic speech was adopted to explore attention modulation. The natural language processing models and the flourishing inter-brain recording methods were adopted to qualify the different types of information from the speaker.
Our study revealed that attention modulated the acoustic, semantic, and speaker's neural activity in an integrative way. When the listener pays attention to a speaker, the sound is the first to be processed: The neural activity in the theta band entrains the acoustic feature with a latency of 200~350 ms. Then, the meaning of the attended speech is parsed: the delta band modulates the attention with a latency of 200~600 ms. The listener entrains the speaker's neural activity 5 seconds before the speech onset. The entrainment of the acoustic feature and the semantic feature is correlated with each other. Only the entrainment of the attended speaker has a significant negative correlation with the comprehension score. Together, these results illustrate an integrative view of how our brain is selectively entrained to different types of information from the speaker.