2019
DOI: 10.1515/comp-2019-0004
|View full text |Cite
|
Sign up to set email alerts
|

Bidirectional deep architecture for Arabic speech recognition

Abstract: Nowadays, the real life constraints necessitates controlling modern machines using human intervention by means of sensorial organs. The voice is one of the human senses that can control/monitor modern interfaces. In this context, Automatic Speech Recognition is principally used to convert natural voice into computer text as well as to perform an action based on the instructions given by the human. In this paper, we propose a general framework for Arabic speech recognition that uses Long Short-Term Memory (LSTM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(15 citation statements)
references
References 28 publications
0
15
0
Order By: Relevance
“…Zerari et al identified labelled basic noun phrases by considering the internal structural features of the phrases, using the corresponding lexical sequences as rules and then performing clipping of the lexical sequences to obtain rule sets. However, the accuracy of recognition needs to be improved [8]. Cui et al selected relevant samples from many stores and used some conventional methods for feature extraction of spectral and rhyme types for a few selected types [9].…”
Section: Related Workmentioning
confidence: 99%
“…Zerari et al identified labelled basic noun phrases by considering the internal structural features of the phrases, using the corresponding lexical sequences as rules and then performing clipping of the lexical sequences to obtain rule sets. However, the accuracy of recognition needs to be improved [8]. Cui et al selected relevant samples from many stores and used some conventional methods for feature extraction of spectral and rhyme types for a few selected types [9].…”
Section: Related Workmentioning
confidence: 99%
“…Indeed, the performance of cov-MLR with N = 100 is 98.7 ± 0.1%, better than using a mean-based decoder with an echo state network of N = 900 neurons (Alalshekmubarak and Smith, 2013), but obtained with 9 times fewer neurons within the reservoir. Furthermore, the performance is at the same level of that obtained with a much more complex network model, as is the long short-term memory network in (Zerari et al, 2019) (98.77%).…”
Section: Resultsmentioning
confidence: 53%
“…Zerari et al. [35] proposed a system to recognise the isolated Arabic utterances using two ASR applications: a) TV spoken command recognition, b) spoken digit recognition. This system consists of several steps: first, pertinent features were extracted from the natural speech (static and dynamic) MFCC features and the Filter Banks (FB) coefficients.…”
Section: Related Workmentioning
confidence: 99%