2022
DOI: 10.1177/23312165221136934
|View full text |Cite
|
Sign up to set email alerts
|

Speech-In-Noise Comprehension is Improved When Viewing a Deep-Neural-Network-Generated Talking Face

Abstract: Listening in a noisy environment is challenging, but many previous studies have demonstrated that comprehension of speech can be substantially improved by looking at the talker's face. We recently developed a deep neural network (DNN) based system that generates movies of a talking face from speech audio and a single face image. In this study, we aimed to quantify the benefits that such a system can bring to speech comprehension, especially in noise. The target speech audio was masked with signal to noise rati… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
7
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 53 publications
3
7
0
Order By: Relevance
“…Our study replicates decades of research by showing that seeing the face of a real talker improves speech-in-noise perception (Peelle and Sommers, 2015; Sumby and Pollack, 1954). Our study also confirms two recent reports that viewing a synthetic face generated by a deep neural network (DNN) significantly improves speech-in-noise perception (Shan et al, 2022; Varano et al, 2022). Both the present study and these previous reports found that the improvement from viewing DNN faces was only about half that provided by viewing real faces.…”
Section: Discussionsupporting
confidence: 91%
See 2 more Smart Citations
“…Our study replicates decades of research by showing that seeing the face of a real talker improves speech-in-noise perception (Peelle and Sommers, 2015; Sumby and Pollack, 1954). Our study also confirms two recent reports that viewing a synthetic face generated by a deep neural network (DNN) significantly improves speech-in-noise perception (Shan et al, 2022; Varano et al, 2022). Both the present study and these previous reports found that the improvement from viewing DNN faces was only about half that provided by viewing real faces.…”
Section: Discussionsupporting
confidence: 91%
“…For comparison, the same pairing with a real visual face evoked the percept of /v/ on 94% of trials (Dias et al, 2016; Shahin, 2019). Taken together, this indicates that for incongruent auditory-visual speech, synthetic faces influenced perception much less than real faces, consistent with the real−synthetic difference for speech-in-noise observed in the present study and (Shan et al, 2022; Varano et al, 2022).…”
Section: Discussionsupporting
confidence: 90%
See 1 more Smart Citation
“…The ability to rapidly generate a synthetic face saying arbitrary words suggests the possibility of an “audiovisual hearing aid” that displays a synthetic talking face to improve comprehension. This possibility received support from two recent studies that used deep neural networks (DNNs) to generate realistic, synthetic talking faces ( Shan et al, 2022 ; Varano et al, 2022 ). Both studies found that viewing synthetic faces significantly improved speech-in-noise perception, but the benefit was only about half as much as viewing a real human talker.…”
Section: Introductionmentioning
confidence: 99%
“…To test this idea, we undertook a behavioral study to compare the perception of speech-in-noise on its own; speech-in-noise with real faces (to serve as a benchmark); and speech-in-noise presented with two types of synthetic faces. The first synthetic face type was generated by a deep neural network, as in the studies of ( Shan et al, 2022 ; Varano et al, 2022 ). The second synthetic face type was generated using FACS, as implemented in the commercial software package JALI ( Edwards et al, 2016 ; Zhou et al, 2018 ).…”
Section: Introductionmentioning
confidence: 99%