Speech identification in noise: Contribution of temporal, spectral, and visual speech cues

Kim, Jeesun; Davis, Chris; Groot, Christopher

doi:10.1121/1.3250425

Cited by 8 publications

(8 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An explanation for the finding that PC1 was the motion component that showed a significant correlation with the percent correct in noise AV scores is that the PC1 movements provided information about the auditory speech signal that could be used to parse this signal from the competing background noise (see Davis and Kim 2004;Kim et al 2009). To determine whether this was the case, a correlation analysis was conducted to examine the degree to which PC1 and the wideband intensity speech envelope (see figure 5) was correlated for in quiet and in noise productions.…”

Section: Resultsmentioning

confidence: 99%

Hearing Speech in Noise: Seeing a Loud Talker is Better

Kim¹,

Sironic

Davis³

2011

Perception

Self Cite

View full text Add to dashboard Cite

Seeing the talker improves the intelligibility of speech degraded by noise (a visual speech benefit). Given that talkers exaggerate spoken articulation in noise, this set of two experiments examined whether the visual speech benefit was greater for speech produced in noise than in quiet. We first examined the extent to which spoken articulation was exaggerated in noise by measuring the motion of face markers as four people uttered 10 sentences either in quiet or in babble-speech noise (these renditions were also filmed). The tracking results showed that articulated motion in speech produced in noise was greater than that produced in quiet and was more highly correlated with speech acoustics. Speech intelligibility was tested in a second experiment using a speech-perception-in-noise task under auditory-visual and auditory-only conditions. The results showed that the visual speech benefit was greater for speech recorded in noise than for speech recorded in quiet. Furthermore, the amount of articulatory movement was related to performance on the perception task, indicating that the enhanced gestures made when speaking in noise function to make speech more intelligible.

show abstract

Section: Resultsmentioning

confidence: 99%

Hearing Speech in Noise: Seeing a Loud Talker is Better

Kim¹,

Sironic

Davis³

2011

Perception

Self Cite

View full text Add to dashboard Cite

show abstract

“…The perceptual doping effect in the AV1–A2 modality order may have been so strong that it greatly helped the listeners to decode the temporal cues necessary to discriminate vowel duration in the vowel duration discrimination task and to extract phonological cues for vowel identification in the gated vowel identification task, subsequently boosting the participants’ performance on these tasks in the A modality. The addition of V cues might have a stronger effect for consonants than vowels in terms of their AV identification (Kim et al 2009; Moradi et al 2017a), which could explain why the abovementioned effect for vowels was not observed in the gated consonant task. For instance, Moradi et al (2017a) reported that the effect of adding V cues on AV identification and cognitive demand reduction was more evident for consonants than for vowels (i.e., more V saliency for the AV identification of consonants than vowels).…”

Section: Discussionmentioning

confidence: 99%

Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise

Moradi

Lidestam

et al. 2019

Ear &Amp; Hearing

View full text Add to dashboard Cite

The findings of the present study support the perceptual doping hypothesis, as prior AV relative to A speech exposure resulted in a larger gain for the subsequent processing of speech stimuli. For complex speech stimuli that were presented in degraded listening conditions, a procedural learning effect (or a combination of procedural learning and perceptual learning effects) also facilitated the identification of speech stimuli, irrespective of whether the prior modality was A or AV.This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

show abstract

“…Everyday speech comprehension is multi-faceted: in face-to-face conversation, the listener receives information from the voice and face of a talker, the accompanying gestures of the hands and body and the overall semantic context of the discussion, which can all be used to aid comprehension of the spoken message. Behaviourally, auditory speech comprehension is enhanced by simultaneous presentation of a face or face-like visual cues (Sumby and Pollack, 1954; Grant & Seitz, 2000b; Girin et al, 2001; Kim & Davis, 2004; Bernstein et al, 2004; Schwartz et al, 2004; Helfer & Freyman, 2005; Ross et al, 2007; Thomas & Pilling, 2007; Bishop & Miller, 2009; Kim et al, 2009; Ma et al, 2009; Hazan et al, 2010). Higher-order linguistic information can also benefit intelligibility: words presented in a sentence providing a rich semantic context are more intelligible than words in isolation or in an abstract sentence, particularly when auditory clarity is compromised (Miller & Isard, 1963; Kalikow et al, 1977; Pichora-Fuller et al, 1995; Dubno et al, 2000; Grant & Seitz, 2000a; Stickney & Assmann, 2001, Obleser et al, 2007).…”

Section: Introductionmentioning

confidence: 99%

Speech comprehension aided by multiple modalities: Behavioural and neural interactions

et al. 2012

View full text Add to dashboard Cite

Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources - e.g. voice, face, gesture, linguistic context - to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with increased activation in left inferior frontal gyrus and left posterior STS. The current multimodal speech comprehension paradigm demonstrates recruitment of a wide comprehension network in the brain, in which posterior STS and fusiform gyrus form sites for convergence of auditory, visual and linguistic information, while left-dominant sites in temporal and frontal cortex support successful comprehension.

show abstract

Speech identification in noise: Contribution of temporal, spectral, and visual speech cues

Cited by 8 publications

References 48 publications

Hearing Speech in Noise: Seeing a Loud Talker is Better

Hearing Speech in Noise: Seeing a Loud Talker is Better

Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise

Speech comprehension aided by multiple modalities: Behavioural and neural interactions

Contact Info

Product

Resources

About