2023
DOI: 10.1109/tmm.2022.3210761
|View full text |Cite
|
Sign up to set email alerts
|

Importance-Aware Information Bottleneck Learning Paradigm for Lip Reading

Abstract: Lip reading is the task of decoding text from speakers' mouth movements. Numerous deep learning-based methods have been proposed to address this task. However, these existing deep lip reading models suffer from poor generalization due to overfitting the training data. To resolve this issue, we present a novel learning paradigm that aims to improve the interpretability and generalization of lip reading models. In specific, a Variational Temporal Mask (VTM) module is customized to automatically analyze the impor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 59 publications
0
1
0
Order By: Relevance
“…Deep learning technology has recently exhibited significant performance improvements in sentence-level speech recognition, visual speech recognition, and speech-visual recognition studies using traditional prediction techniques [26][27][28][29]. When comparing the word recognition rate performance for phrases on the same benchmark dataset, the performance of deep learningbased visual recognition studies [17] improved by 36.4% over conventional visual recognition studies [30].…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning technology has recently exhibited significant performance improvements in sentence-level speech recognition, visual speech recognition, and speech-visual recognition studies using traditional prediction techniques [26][27][28][29]. When comparing the word recognition rate performance for phrases on the same benchmark dataset, the performance of deep learningbased visual recognition studies [17] improved by 36.4% over conventional visual recognition studies [30].…”
Section: Related Workmentioning
confidence: 99%