Enhanced Marathi Speech Recognition Facilitated by Grasshopper Optimisation-Based Recurrent Neural Network

Bachate, Ravindra P.; Sharma, Ashok; Singh, Amar; Aly, Ayman A.; Alghtani, Abdulaziz H.; Le, Dac‐Nhuong

doi:10.32604/csse.2022.024214

Cited by 2 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When choosing a modeling unit, it is necessary to consider whether the modeling unit fully represents the context information and whether it can describe the generalization of acoustic features. Based on the establishment of the baseline acoustic model, the error rate of the speech-to-pinyin sequence was significantly reduced in this study by continuously optimizing the acoustic model [7].…”

Section: The Connectionist Temporal Classification-convolutional Neur...mentioning

confidence: 99%

Improving Speech Enhancement Framework via Deep Learning

Hsiao¹,

Sung²

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Speech plays an extremely important role in social activities. Many individuals suffer from a "speech barrier," which limits their communication with others. In this study, an improved speech recognition method is proposed that addresses the needs of speech-impaired and deaf individuals. A basic improved connectionist temporal classification convolutional neural network (CTC-CNN) architecture acoustic model was constructed by combining a speech database with a deep neural network. Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication. The method can be extended to modern artificial intelligence techniques, with multiple applications such as meeting minutes, medical reports, and verbatim records for cars, sales, etc. For experiments, a modified CTC-CNN was used to train an acoustic model, which showed better performance than the earlier common algorithms. Thus a CTC-CNN baseline acoustic model was constructed and optimized, which reduced the error rate to about 18% and improved the accuracy rate.

show abstract

Section: The Connectionist Temporal Classification-convolutional Neur...mentioning

confidence: 99%

Improving Speech Enhancement Framework via Deep Learning

Hsiao¹,

Sung²

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Speaker markers: Speaker selection, taking turns, elaboration, and digression. After providing definitions of discourse markers, turns, floor control types/turn segments, topic units, and actions, a list of verbal and non-verbal discourse markers is specified and grouped into subcategories according to their semantic relationship [3].…”

Section: Speech Recognitionmentioning

confidence: 99%

Speech Recognition via CTC-CNN Model

Sung,

Kang,

Hsiao

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

In the speech recognition system, the acoustic model is an important underlying model, and its accuracy directly affects the performance of the entire system. This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification (CTC) algorithm, which plays an important role in the end-to-end framework, established a convolutional neural network (CNN) combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition. This study uses a sound sensor, ReSpeaker Mic Array v2.0.1, to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference. The baseline acoustic model in this study faces challenges such as long training time, high error rate, and a certain degree of overfitting. The model is trained through continuous design and improvement of the relevant parameters of the acoustic model, and finally the performance is selected according to the evaluation index. Excellent model, which reduces the error rate to about 18%, thus improving the accuracy rate. Finally, comparative verification was carried out from the selection of acoustic feature parameters, the selection of modeling units, and the speaker's speech rate, which further verified the excellent performance of the CTCCNN_5 + BN + Residual model structure. In terms of experiments, to train and verify the CTC-CNN baseline acoustic model, this study uses THCHS-30 and ST-CMDS speech data sets as training data sets, and after 54 epochs of training, the word error rate of the acoustic model training set is 31%, the word error rate of the test set is stable at about 43%. This experiment also considers the surrounding environmental noise. Under the noise level of 80∼90 dB, the accuracy rate is 88.18%, which is the worst performance among all levels. In contrast, at 40-60 dB, the accuracy was as high as 97.33% due to less noise pollution.

show abstract

Enhanced Marathi Speech Recognition Facilitated by Grasshopper Optimisation-Based Recurrent Neural Network

Cited by 2 publications

References 22 publications

Improving Speech Enhancement Framework via Deep Learning

Improving Speech Enhancement Framework via Deep Learning

Speech Recognition via CTC-CNN Model

Contact Info

Product

Resources

About