2020 5th International Conference on Communication and Electronics Systems (ICCES) 2020
DOI: 10.1109/icces48766.2020.9137926
|View full text |Cite
|
Sign up to set email alerts
|

Visual Speech Recognition: A Deep Learning Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…Accurate recognition of spoken words enhances the system's user experience, reliability, and productivity, making it an essential consideration when designing and evaluating VSR systems. Hence, the right combination of DL architectures and datasets greatly improves the VSR system, which is visible in [161], [163].…”
Section: E the Vital Combination Of DL Architecture And Proper Datasetsmentioning
confidence: 98%
See 1 more Smart Citation
“…Accurate recognition of spoken words enhances the system's user experience, reliability, and productivity, making it an essential consideration when designing and evaluating VSR systems. Hence, the right combination of DL architectures and datasets greatly improves the VSR system, which is visible in [161], [163].…”
Section: E the Vital Combination Of DL Architecture And Proper Datasetsmentioning
confidence: 98%
“…A customized dataset with more than 48,000 images was used to evaluate the lip segmentation using this model and a testing accuracy of 98.4% was depicted. Navin et al [163] proposed a deep learning model for VSR to perform word-level classification. ResNet architecture is used along with 3D convolution layers and Gated recurrent units (GRU).…”
Section: Sequence Of Video Framesmentioning
confidence: 99%
“…Visual-only approach, where spoken word or letter is decided solely based on visual cues, spanned across multiple languages, such as English [8], Japanese [9] and Bahasa Indonesia [10]. For Bahasa Indonesia, by focusing on the nature of lip motion acting as a visual cue of the speaker, Maxalmina concluded that it is possible to detect specific vowels with the highest accuracy of 84%.…”
Section: Introductionmentioning
confidence: 99%
“…Another internal element that might cause intra-speaker vocal fluctuation is emotion [6]. Emotion identification is the process for detecting and identifying emotions.…”
mentioning
confidence: 99%