2021
DOI: 10.48550/arxiv.2110.07879
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Advances and Challenges in Deep Lip Reading

Abstract: Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition systems, other potential applications, such as biometric identification, and the promised gains of VSR systems, have motivated extensive efforts on developing the lip reading technology. This paper provides a comprehensive survey of the stateof-the-art deep learning based VS… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 101 publications
0
2
0
Order By: Relevance
“…For instance, in the CTC configuration, 'Hel-lo' is the correct representation of 'Hello,' where 'l' is duplicated. The CTC loss function accepts a model output matrix consisting of scores assigned to each token at every time step alongside the actual truth sequence [27].…”
Section: Ctc Loss Functionmentioning
confidence: 99%
“…For instance, in the CTC configuration, 'Hel-lo' is the correct representation of 'Hello,' where 'l' is duplicated. The CTC loss function accepts a model output matrix consisting of scores assigned to each token at every time step alongside the actual truth sequence [27].…”
Section: Ctc Loss Functionmentioning
confidence: 99%
“…Sooraj et al conducted a review of the general structure of ALR systems and how they work [38]. Oghbaie similarly reviewed the general structure of ALR systems and determined areas for future advancements in the systems' structures [39]. Agrawal et al conducted a literature review that focused on the methods by which the networks are trained, particularly comparing the prepossessing and pretraining methods that benefit ALR systems [40].…”
Section: Introductionmentioning
confidence: 99%