2019
DOI: 10.1126/sciadv.aay6279
|View full text |Cite
|
Sign up to set email alerts
|

A speech envelope landmark for syllable encoding in human superior temporal gyrus

Abstract: The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a wel… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

21
157
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 153 publications
(179 citation statements)
references
References 58 publications
21
157
1
Order By: Relevance
“…However, we also wanted to explore simpler models to determine the level of complexity that would be required to optimize prediction. Sound onsets offer a promising candidate for a neurally relevant, low-dimensional auditory feature [ 36 , 39 , 40 , 43 ]. As a first test of this hypothesis, we reduced the articulatory features to phoneme onsets (Sg & PhOn, pink).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, we also wanted to explore simpler models to determine the level of complexity that would be required to optimize prediction. Sound onsets offer a promising candidate for a neurally relevant, low-dimensional auditory feature [ 36 , 39 , 40 , 43 ]. As a first test of this hypothesis, we reduced the articulatory features to phoneme onsets (Sg & PhOn, pink).…”
Section: Resultsmentioning
confidence: 99%
“…Another possibility is that the performance boost provided by articulatory features is instead attributable to their correlation with simpler acoustic properties. It has repeatedly been observed that neuronal responses from bilateral superior temporal regions are particularly sensitive to acoustic edges [ 35 , 36 , 37 , 38 , 39 , 40 ]. Features that extract these onsets from envelope representations via a half-wave rectification of the temporal gradient of time-varying energy have been used in several studies [ 36 , 41 , 42 ].…”
Section: Introductionmentioning
confidence: 99%
“…One sensible candidate for such a transformation is the first temporal derivative, which is commonly used to augment spectral acoustic features in automatic speech recognition algorithms (Hunt, 1999). Recent data show that neuronal populations in the human auditory do not encode moment-by-moment changes in the amplitude envelope of speech, but rather detect local maxima (i.e., onset edges) in the first derivative of the envelope (Oganian & Chang, 2018). These local maxima often occur during consonant-vowel transitions and, indeed, some of our own work shows that visual speech contributes maximally to audiovisual fusion (McGurk & MacDonald, 1976) during peaks in the first derivative of the interlip distance function (i.e., lip velocity) occurring at a consonant-vowel transition (Jonathan H Venezia, Thurman, Matchin, George, & Hickok, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…At the syllabic level, it has been shown that auditory cortex neuronal oscillations in the theta range are elicited without higher-level speech processing (Howard and Poeppel, 2010;Rimmele et al, 2015), and oscillations occur in sync with acoustic cues at the syllabic scale (Doelling et al, 2014;Oganian and Chang, 2019). At the phrasal level, knowledge about the nature of the acoustic cues that constitute accentuation is sparse, even though linguistic cues in the delta range have been pointed out to be relevant for comprehension (Aubanel et al, 2016).…”
Section: Acoustic Driven Versus Top-down Driven Delta Periodicitiesmentioning
confidence: 99%
“…These models proved to be capable of explaining a range of counterintuitive psychophysical data (e.g., Ghitza and Greenberg, 2009;Ghitza, 2011Ghitza, , 2012Ghitza, , 2014 that are hard to explain by conventional models of speech perception. Doelling et al (2014) provided MEG evidence for this computation principle, showing that perceptual segmentation on the syllabic level appears to require acoustic-driven theta neuronal oscillations (see also: Park et al, 2015;Oganian and Chang, 2019).…”
Section: Introductionmentioning
confidence: 99%