ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053299
|View full text |Cite
|
Sign up to set email alerts
|

Cogans For Unsupervised Visual Speech Adaptation To New Speakers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…In fact, it has been proven that each person produces speech in a unique way [36], a finding that supports the idea that visual speech features are highly sensitive to the identity of the speaker [29]. However, although a wide range of works have studied the speaker adaptation of end-to-end systems in the field of ASR [37][38][39][40], only a few works in this regard have addressed VSR [41,42]. Although this speaker-dependent approach makes for a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful in a non-invasive and inconspicuous way for people who suffer from communication difficulties [15][16][17].…”
Section: Introductionmentioning
confidence: 93%
See 1 more Smart Citation
“…In fact, it has been proven that each person produces speech in a unique way [36], a finding that supports the idea that visual speech features are highly sensitive to the identity of the speaker [29]. However, although a wide range of works have studied the speaker adaptation of end-to-end systems in the field of ASR [37][38][39][40], only a few works in this regard have addressed VSR [41,42]. Although this speaker-dependent approach makes for a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful in a non-invasive and inconspicuous way for people who suffer from communication difficulties [15][16][17].…”
Section: Introductionmentioning
confidence: 93%
“…Kandala et al [41] defined an architecture based on the CTC paradigm [23] where, after computing visual speech features, a speaker-specific identity vector was integrated as an additional input to the decoder. Fernandez-Lopez et al [42] approached the problem indirectly, studying how to adapt the visual front end of an audiovisual recognition system. Specifically, the authors proposed an unsupervised method that allowed an audiovisual system to be adapted when only visual data were available.…”
Section: Related Workmentioning
confidence: 99%
“…Kandala et al [16] defined an architecture based on the Connectionist Temporal Classification (CTC) paradigm [26] where, once visual speech features were computed, a speakerspecific identity vector was integrated as an additional input to the decoder. Moreover, Fernandez-Lopez et al [17] approached the problem indirectly, since it was studied how to adapt the visual front-end of an audio-visual recognition system. Thus, the authors proposed an unsupervised method that allowed an audiovisual system to be adapted when only the visual channel was available.…”
Section: Related Workmentioning
confidence: 99%
“…As detailed in Section 2, there is a wide range of works which have studied the speaker adaptation of end-toend systems in the field of Acoustic Speech Recognition (ASR) [12,13,14,15]. On the contrary, few works in this regard have been addressed in VSR [16,17]. Albeit this speaker-dependent approach means facing a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful, in a non-invasive and inconspicuous way, for people who suffer from communication difficulties [18,19].…”
Section: Introductionmentioning
confidence: 99%