Comparison of acoustical models of GMM-HMM based for speech recognition in Hindi using PocketSphinx

Manasa, C. Sai; Priya, K. Jeeva; Gupta, Deepa

doi:10.1109/iccmc.2019.8819747

Cited by 8 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech recognition technology begins with the recognition of a single phoneme instead of recognizing a continuous word [27]. The phoneme recognition in the state-of-the-art speech recognition model is done with the help of the Gaussian Mixer Model (GMM)-Hidden Markov Model (HMM)-Language Model (LM) paradigm [22,27]. In the GMM-HMM-LM paradigm, GMM will process input speech feature vector (i.e.…”

Section: Related Workmentioning

confidence: 99%

“…In the GMM-HMM-LM paradigm, GMM will process input speech feature vector (i.e. Mel Frequency Cepstral Coefficient (MFCC) [30]) and emits emission probability for HMM [5,22,27]. The HMM together with LM compute the most likely sequence of phoneme with the help of a decoder [6].…”

Section: Related Workmentioning

confidence: 99%

“…The CTC speech recognition model improves recognition accuracy in a noise environment. Both CTC and attention deep learning frameworks perform well and achieve an excellent result, but face challenges in incorporating the highly variable features of the natural language like accent style [22], various speaker attributes [25], speed of production of the speech signal [27], etc. The advancement in deep learning technology continues and Hamid et al [14], and Guiming et al [12], replaced the attention-RNN speech recognition model and CTC-RNN speech recognition model with the CNN speech recognition model.…”

Section: Related Workmentioning

confidence: 99%

“…However, for novel engineering applications where memory and computational resources are limited, the use of a broadband-based speech interaction system is costly. It also compromises privacy, battery life [26] as well as it highly depends on external factors, for example, network quality [16], network speed [1], latency [27], network traffic [36], etc.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Kinkar¹,

Jain²

2021

ITC

View full text Add to dashboard Cite

The presented paper proposes a new speech command recognition model for novel engineering applications with limited resources. We built the proposed model with the help of a Convolutional Recurrent Neural Network (CRNN). The use of CRNN instead of Convolutional Neural Network (CNN) helps us to reduce the model parameters and memory requirement as per resource constraints. Furthermore, we insert transmute and curtailment layer between the layers of CRNN. By doing this we further reduce model parameters and float number of operations to half of the CRNN requirement. The proposed model is tested on Google’s speech command dataset. The obtained result shows that the proposed CRNN model requires 1/3 parameters as compared to the CNN model. The number of parameters of the CRNN model is further reduced by 45% and the float numbers of operations between 2% to 12 % in different recognition tasks. The recognition accuracy of the proposed model is 96% on Google’s speech command dataset, and on laboratory recording, its recognition accuracy is 89%.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Kinkar¹,

Jain²

2021

ITC

View full text Add to dashboard Cite

show abstract

“…The acoustic model (AM) in an SR system creates the essential units of speech in the composed structure regarding a specific input signal [15]. The signal which acts as input is grafted up into overlapping periods of 10 ms with a 5 ms. At that point from each frame, 39 MFCC [1] co-efficient are extricated.…”

Section: Csr Acoustic Modelmentioning

confidence: 99%

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

2019

IJITEE

View full text Add to dashboard Cite

This paper proposes a framework that is intended to do the comparably accurate recognition of speech and in precise, continuous speech recognition (CSR) based on triphone modelling for Kannada dialect. For designing the proposed framework, the features from the speech data are obtained from the well-known feature extraction technique Mel-frequency cepstral coefficients (MFCC) and from its transformations, like, linear discriminant analysis (LDA) and maximum likelihood linear transforms (MLLT) are obtained from Kannada speech data files. At that point, the system is trained to evaluate the hidden Markov model (HMM) parameters for continuous speech (CS) data. The persistent Kannada speech information is gathered from 2600 speakers (1560 men and 1040women) of the age bunch in the scope of 14 years-80 years. The speech information is acquired from different geographical regions of the Karnataka (one of the 29 states situated in the southern part of India) state under degraded condition. It comprises of 21,551 words that spread 30 locales. The performance evaluation of both monophone and triphone models concerning word error rate (WER) is done and the obtained results are compared with the standard databases such as TIMIT and aurora4. A significant reduction in WER is obtained for triphone models. The speech recognition (SR) rate is verified for both offline and online recognition mode for all the speakers. The results reveal that the recognition rate (RR) for Kannada speech corpus has got a better improvement over the state-of-the-art existing databases.

show abstract

Speech Recognition Mobile Application for Learning Iqra’ Using PocketSphinx

Nasution

Monika

Masnur

2022

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

Comparison of acoustical models of GMM-HMM based for speech recognition in Hindi using PocketSphinx

Cited by 8 publications

References 15 publications

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language

Speech Recognition Mobile Application for Learning Iqra’ Using PocketSphinx

Contact Info

Product

Resources

About