2009
DOI: 10.1371/journal.pcbi.1000436
|View full text |Cite
|
Sign up to set email alerts
|

The Natural Statistics of Audiovisual Speech

Abstract: Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. On… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

42
629
7
1

Year Published

2011
2011
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 590 publications
(679 citation statements)
references
References 75 publications
(128 reference statements)
42
629
7
1
Order By: Relevance
“…For simplicity we focus here on a sensor‐level time‐domain analysis. Since previous work has shown speech entrainment mostly at lower frequencies, we extracted the wideband amplitude envelope of the speech stimulus [Chandrasekaran et al, 2009; Gross et al, 2013] and then low‐pass filtered with a 12 Hz cutoff (third order noncausal Butterworth). The MEG signal was obtained from a 248‐magnetometer whole‐head MEG system (MAGNES 3600 WH, 4D Neuroimaging).…”
Section: Resultsmentioning
confidence: 99%
“…For simplicity we focus here on a sensor‐level time‐domain analysis. Since previous work has shown speech entrainment mostly at lower frequencies, we extracted the wideband amplitude envelope of the speech stimulus [Chandrasekaran et al, 2009; Gross et al, 2013] and then low‐pass filtered with a 12 Hz cutoff (third order noncausal Butterworth). The MEG signal was obtained from a 248‐magnetometer whole‐head MEG system (MAGNES 3600 WH, 4D Neuroimaging).…”
Section: Resultsmentioning
confidence: 99%
“…Firstly, it is appropriate to note that lipread input is bottom-up visual information that is usually perceived in synchrony with the auditory stream, although it actually often precedes the acoustic signal (Chandrasekaran et al, 2009). Lexical information, in contrast, exerts a top-down influence on auditory processing, and is argued to become important as the word is being recognized (e.g., Samuel & Pitt, 2003).…”
Section: -Lipread Versus Lexically Induced Recalibration; Future Dirementioning
confidence: 99%
“…Given that in AV speech, lipread input usually precedes the auditory signal (e.g., Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar, 2009), Stekelenburg and Vroomen (2007) argued that the visually-induced N1 modulations arise whenever the visual input precedes the audio, thus warning the listener about when the sound is going to occur. This was corroborated by similar results obtained with artificial AV stimuli in which anticipatory visual motion reliably predicted sound onset (Vroomen & Stekelenburg, 2010).…”
Section: -Introductionmentioning
confidence: 99%
“…Our experimental paradigm capitalizes on a natural ~150-ms temporal lag between the onset of facial movements and vocal chord vibration that naturally occurs when we speak 9 . This lag allows the observer to synthesize phonological predictions before auditory onset on the basis of visual information.…”
mentioning
confidence: 99%
“…Here, sensory prediction errors were induced experimentally by violating the congruence between visual and auditory information. We also exploited the specificity with which visual input predicts auditory input to create a gradient of perceived incongruence [8][9][10] . We thus induced graded prediction errors by varying audio-visual congruence and expected them to be manifest at the neural level in the interaction between the amount of predictive information conveyed by the visual input (predictiveness) and the validity of this information with respect to incoming auditory input (audio-visual congruence).…”
mentioning
confidence: 99%