2022
DOI: 10.48550/arxiv.2205.10839
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Learning for Visual Speech Analysis: A Survey

Abstract: Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 145 publications
0
10
0
Order By: Relevance
“…On the other hand, some other reviews, such as [73,79], cover more recent datasets. However, these works provide only a combined list of datasets in one table and focus mainly on lip-reading datasets rather than on AV datasets.…”
Section: Recent Audio-visual Speech Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, some other reviews, such as [73,79], cover more recent datasets. However, these works provide only a combined list of datasets in one table and focus mainly on lip-reading datasets rather than on AV datasets.…”
Section: Recent Audio-visual Speech Datasetsmentioning
confidence: 99%
“…), the development of AVSR methodology is still at an early stage and does not yet meet the performance standards required for practical implementation in real-world applications. This is certainly not due to a lack of effort on the part of researchers, as there have been many excellent works on AVSR [73]. Therefore, a comprehensive analysis of recent advances, identification of key barriers and unresolved issues, and exploration of potential avenues for future research are essential.…”
Section: Introductionmentioning
confidence: 99%
“…In this introduction, we provide an overview of the motivation behind leveraging deep learning for lip reading, highlighting the challenges faced by traditional methods and the potential of deep learning approaches to overcome these challenges. [4] We also outline the objectives of our proposed approach and the structure of the paper. Through this research, we aim to contribute to the advancement of lip-reading technology, with potential applications in communication aids, accessibility solutions, and human-computer interaction systems.…”
Section: Introductionmentioning
confidence: 99%
“…However, current methods mainly focus on improving the synchronization between lip movements and speech [38], neglecting the emotional variation of facial expressions. We argue that emotions are an essential aspect of human communication and expression, and emotion absence in 3D facial animations may cause the uncanny valley effect.…”
Section: Introductionmentioning
confidence: 99%