2018
DOI: 10.1016/j.imavis.2018.07.002
|View full text |Cite
|
Sign up to set email alerts
|

Survey on automatic lip-reading in the era of deep learning

Abstract: In the last few years, there has been an increasing interest in developing systems for Automatic LipReading (ALR). Similarly to other computer vision applications, methods based on Deep Learning (DL) have become very popular and have permitted to substantially push forward the achievable performance. In this survey, we review ALR research during the last decade, highlighting the progression from approaches previous to DL (which we refer to as traditional) toward end-to-end DL architectures. We provide a compre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
75
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 104 publications
(77 citation statements)
references
References 115 publications
(203 reference statements)
0
75
0
2
Order By: Relevance
“…In such environments, there is a need to build a speech recognizer that can use both audio and facial visual information. It would also be necessary to have a speech recognizer that would solely use visual information and does not depend on audio input (Fernandez-Lopez & Sukno, 2018). i.e., such speech recognizer should extract features from the facial parts, especially lip movement, and facial expressions.…”
Section: Literature Surveymentioning
confidence: 99%
See 3 more Smart Citations
“…In such environments, there is a need to build a speech recognizer that can use both audio and facial visual information. It would also be necessary to have a speech recognizer that would solely use visual information and does not depend on audio input (Fernandez-Lopez & Sukno, 2018). i.e., such speech recognizer should extract features from the facial parts, especially lip movement, and facial expressions.…”
Section: Literature Surveymentioning
confidence: 99%
“…i.e., such speech recognizer should extract features from the facial parts, especially lip movement, and facial expressions. Several attempts have been made to detect speech from the lip movement by (Almajai, Cox, Harvey, & Lan, 2016;Chung, Senior, Vinyals, & Zisserman, 2017;Dupont & Luettin, 2000;Petridis & Pantic, 2016;Sui, Bennamoun, & Togneri, 2015;Wand, Koutnik, & Schmidhuber, 2016;Yau, Kumar, & Weghorn, 2007;Zhou, Zhao, Hong, & Pietikäinen, 2014), but the results are much low as compared with audio speech recognizers (Fernandez-Lopez & Sukno, 2018).…”
Section: Literature Surveymentioning
confidence: 99%
See 2 more Smart Citations
“…The field of visual speech recognition (VSR), or lipreading, has witnessed dramatic breakthroughs recently, primarily due to the paradigm shift from hand-crafted features to deep learning based models [1][2][3][4][5][6][7][8], coupled with the public release of large suitable corpora in a variety of environments [9][10][11][12][13][14][15], as also reviewed in [16,17]. Such models however, while reducing recognition errors compared to previous approaches, are not as efficient to compute and store.…”
Section: Introductionmentioning
confidence: 99%