An Lipreading Modle with DenseNet and E3D-LSTM

Bi, Chongyuan; Zhang, Jianhua; Yang, Li; Chen, Ping

doi:10.1109/icsai48974.2019.9010432

Cited by 4 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They have reached an average recognition rate of 35%. Bi et al [40] have developed a DenseNet network structure, the 3D convolution neural network with LSTM (E3DLSTM), which handles the time modeling for features extraction. The CTC layer is then utilized as a cascading time classification.…”

Section: Deep Learning Features Based Modelsmentioning

confidence: 99%

Deep Learning-Based Approach for Arabic Visual Speech Recognition

Ullah¹,

Zahid²,

Algarni³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Lip-reading technologies are rapidly progressing following the breakthrough of deep learning. It plays a vital role in its many applications, such as: human-machine communication practices or security applications. In this paper, we propose to develop an effective lip-reading recognition model for Arabic visual speech recognition by implementing deep learning algorithms. The Arabic visual datasets that have been collected contains 2400 records of Arabic digits and 960 records of Arabic phrases from 24 native speakers. The primary purpose is to provide a high-performance model in terms of enhancing the preprocessing phase. Firstly, we extract keyframes from our dataset. Secondly, we produce a Concatenated Frame Images (CFIs) that represent the utterance sequence in one single image. Finally, the VGG-19 is employed for visual features extraction in our proposed model. We have examined different keyframes: 10, 15, and 20 for comparing two types of approaches in the proposed model: (1) the VGG-19 base model and (2) VGG-19 base model with batch normalization. The results show that the second approach achieves greater accuracy: 94% for digit recognition, 97% for phrase recognition, and 93% for digits and phrases recognition in the test dataset. Therefore, our proposed model is superior to models based on CFIs input.

show abstract

Section: Deep Learning Features Based Modelsmentioning

confidence: 99%

Deep Learning-Based Approach for Arabic Visual Speech Recognition

Ullah¹,

Zahid²,

Algarni³

et al. 2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…In this context, it is seen that hybrid models give more successful results. In challenging datasets such as the LRW1000 dataset, it is seen to be low in hybrid models (Bi et al 2019, Xiao et al 2020). Real-life data sets are more challenging than other data sets.…”

Section: Figure 4 Usage Rates Of Multiple Datasetsmentioning

confidence: 99%

Derin Öğrenme ile Dudak Okuma Üzerine Detaylı Bir Araştırma

Erbey

Barışçı

2022

IJERAD

View full text Add to dashboard Cite

Derin öğrenme çalışmaları ile bilgisayarlı görü ve ses tanıma gibi alanlarda çok başarılı sonuçlar elde edilmiştir. Derin öğrenmenin bu alanlardaki başarıları ile insanların hayatını kolaylaştıran teknolojiler geliştirilmektedir. Bu teknolojilerden biri de ses tanıma cihazlarıdır. Yapılan araştırmalar sonucunda ses tanıma cihazlarının, gürültüsüz ortamlarda iyi sonuçlar vermesine rağmen gürültülü ortamlarda ise başarılarının düştüğü görülmektedir. Derin öğrenme yöntemleri ile gürültülü ortamlarda yaşanan ses tanıma problemleri görsel sinyaller kullanılarak çözülebilir. Bilgisayarlı görü sayesinde insan dudaklarının analizi ile karşıdaki kişinin ne konuştuğunun tespit edilerek ses tanıma cihazlarının başarıları artırılabilir. Bu çalışmada, dudak okuma ile ilgili derin öğrenme yöntemleri kullanan çalışmalar ve veri setleri tanıtılmıştır. Yapılan çalışma sonucunda dudak okumanın akademik olarak çalışılması gereken bir alan olduğu söylenebilir.

show abstract