2022
DOI: 10.3389/fnins.2021.781196
|View full text |Cite
|
Sign up to set email alerts
|

Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans

Abstract: Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise c… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(14 citation statements)
references
References 31 publications
4
10
0
Order By: Relevance
“…This study demonstrates that in the absence of a natural visual stimulus, speech comprehension can be enhanced by a synthesized, realistic talking face that is generated purely from the acoustical signal using a DNN-based model. This result is consistent with the recently submitted preprint by Varano et al (2022), where they showed that a GAN model improves speech comprehension in noise both for human and an AI speech recognition system, although they also found the natural face performed better than the modelgenerated face. That study tested speech comprehension in only one SNR (−8.82 dB) with speech-weighted noise as the background noise.…”
Section: Discussionsupporting
confidence: 91%
See 2 more Smart Citations
“…This study demonstrates that in the absence of a natural visual stimulus, speech comprehension can be enhanced by a synthesized, realistic talking face that is generated purely from the acoustical signal using a DNN-based model. This result is consistent with the recently submitted preprint by Varano et al (2022), where they showed that a GAN model improves speech comprehension in noise both for human and an AI speech recognition system, although they also found the natural face performed better than the modelgenerated face. That study tested speech comprehension in only one SNR (−8.82 dB) with speech-weighted noise as the background noise.…”
Section: Discussionsupporting
confidence: 91%
“…The synthetic faces in both this study and in Varano et al (2022) provided a benefit approximately half as large as the natural face did. We believe these benefits stem from realistic faces and especially accurate mouth and articulator shapes.…”
Section: Discussionmentioning
confidence: 63%
See 1 more Smart Citation
“…Our study replicates decades of research by showing that seeing the face of a real talker improves speech-in-noise perception (Peelle and Sommers, 2015; Sumby and Pollack, 1954). Our study also confirms two recent reports that viewing a synthetic face generated by a deep neural network (DNN) significantly improves speech-in-noise perception (Shan et al, 2022; Varano et al, 2022). Both the present study and these previous reports found that the improvement from viewing DNN faces was only about half that provided by viewing real faces.…”
Section: Discussionsupporting
confidence: 90%
“…The present study has a number of limitations. In order to maximize the number of tested words and minimize experimental time, only a single noise level was tested, as in a previous study of DNN faces (Varano et al, 2022), with a high level of noise selected to maximize the benefit of visual speech (Rennig et al, 2020). Another previous study of DNN faces tested multiple noise levels and found a lawful relationship between different noise levels and perception (Shan et al, 2022).…”
Section: Limitations Of the Present Studymentioning
confidence: 99%