Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3237
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Adaptation for Lip-Reading Using Visual Identity Vectors

Abstract: Visual speech recognition or lipreading suffers from high word error rate (WER) as lipreading is based solely on articulators that are visible to the camera. Recent works mitigated this problem using complex architectures of deep neural networks. Ivector based speaker adaptation is a well known technique in ASR systems used to reduce WER on unseen speakers. In this work, we explore speaker adaptation of lipreading models using latent identity vectors (visual i-vectors) obtained by factor analysis on visual fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…In fact, it has been proven that each person produces speech in a unique way [36], a finding that supports the idea that visual speech features are highly sensitive to the identity of the speaker [29]. However, although a wide range of works have studied the speaker adaptation of end-to-end systems in the field of ASR [37][38][39][40], only a few works in this regard have addressed VSR [41,42]. Although this speaker-dependent approach makes for a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful in a non-invasive and inconspicuous way for people who suffer from communication difficulties [15][16][17].…”
Section: Introductionmentioning
confidence: 93%
See 1 more Smart Citation
“…In fact, it has been proven that each person produces speech in a unique way [36], a finding that supports the idea that visual speech features are highly sensitive to the identity of the speaker [29]. However, although a wide range of works have studied the speaker adaptation of end-to-end systems in the field of ASR [37][38][39][40], only a few works in this regard have addressed VSR [41,42]. Although this speaker-dependent approach makes for a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful in a non-invasive and inconspicuous way for people who suffer from communication difficulties [15][16][17].…”
Section: Introductionmentioning
confidence: 93%
“…Although this research describes approaches that could be adopted to any speech modality, it is noteworthy that few works have explicitly focused on speaker adaptation for VSR systems. Kandala et al [41] defined an architecture based on the CTC paradigm [23] where, after computing visual speech features, a speaker-specific identity vector was integrated as an additional input to the decoder. Fernandez-Lopez et al [42] approached the problem indirectly, studying how to adapt the visual front end of an audiovisual recognition system.…”
Section: Related Workmentioning
confidence: 99%
“…Albeit this research describes approaches that could be adopted to any speech modality, it is noteworthy that few works have focused specifically on speaker adaptation for VSR systems. Kandala et al [16] defined an architecture based on the Connectionist Temporal Classification (CTC) paradigm [26] where, once visual speech features were computed, a speakerspecific identity vector was integrated as an additional input to the decoder. Moreover, Fernandez-Lopez et al [17] approached the problem indirectly, since it was studied how to adapt the visual front-end of an audio-visual recognition system.…”
Section: Related Workmentioning
confidence: 99%
“…As detailed in Section 2, there is a wide range of works which have studied the speaker adaptation of end-toend systems in the field of Acoustic Speech Recognition (ASR) [12,13,14,15]. On the contrary, few works in this regard have been addressed in VSR [16,17]. Albeit this speaker-dependent approach means facing a less demanding task, it should not be forgotten that speaker-adapted VSR systems could be helpful, in a non-invasive and inconspicuous way, for people who suffer from communication difficulties [18,19].…”
Section: Introductionmentioning
confidence: 99%