2022
DOI: 10.32604/cmc.2022.019450
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning-Based Approach for Arabic Visual Speech Recognition

Abstract: Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…The research includes important processes such as face detection, lip localization, feature extraction, classifier training, and word recognition. The accuracy of digit recognition was 94%, phrase recognition was 97%, and phrase and digit recognition was 93% [10].…”
Section: B Related Workmentioning
confidence: 98%
“…The research includes important processes such as face detection, lip localization, feature extraction, classifier training, and word recognition. The accuracy of digit recognition was 94%, phrase recognition was 97%, and phrase and digit recognition was 93% [10].…”
Section: B Related Workmentioning
confidence: 98%
“…For testing their proposed system, they used nine sentences uttered by 9 Arabic speakers and achieved 62.9% accuracy. Recently, researchers continued to improve the recognition process and performance at this level, as shown in [26]; they presented an Arabic dataset of sentences and numbers for visual speech recognition purposes, which contains 960 sentence videos and 2400 number videos from 24 speakers. They used concatenated frame images (CFIs) as preprocessing for their dataset, resulting in one single image containing utterance sequences, which are fed into their proposed model: visual geometry group (VGG-19) network with batch normalization for feature extraction and classification.…”
Section: Recognition Of Arabic At the Sentence Levelmentioning
confidence: 99%
“…Beyond robotics, deep learning finds applications in diverse domains including image analysis [156][157][158], video analysis [159][160][161], natural language generation [162,163], speech recognition [164][165][166], biometrics [167,168], text analytics [169,170], and natural language processing [171][172][173].…”
Section: Deep Learningmentioning
confidence: 99%