Research on Tibetan Speech Recognition Based on the Am-do Dialect

Khysru, Kuntharrgyal; Wei, Jianguo; Dang, Jianwu

doi:10.32604/cmc.2022.027591

Cited by 2 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, after the speech input, the output sequence with the highest probability is selected, and after CTC decoding optimization, the final recognition result (x) is provided as output, where the operation formula is shown in Eq. (1).…”

Section: The Connectionist Temporal Classification-convolutional Neur...mentioning

confidence: 99%

“…In this study, deep learning was integrated into the development stage of the CTC-CNN model. The main problems are: (1) training usually must solve a highly nonlinear optimization problem, which may easily lead to many local minima during the training process of the network, and (2) a too-long training time may lead to overfitting results. In practical application, the system is stable, efficient, and general-purpose, and more than 97.5% of the recognition rate of noisy speech can be achieved.…”

Section: Figure 12: Comparison Diagrams Of Applied Normalizationmentioning

confidence: 99%

“…Deep learning has been predominantly used in visual recognition, speech recognition, natural language processing, biomedicine, and other fields, where it has achieved excellent results. However, the performance of the acoustic model directly affects the accuracy and stability of the final speech recognition system, such that it is necessary to consider its establishment, optimization, and efficiency in detail [1]. The experiments in this study employed CTC-CNN, which exhibits a better performance than the earlier commonly employed Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) acoustic model, to train the acoustic model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improving Speech Enhancement Framework via Deep Learning

Hsiao¹,

Sung²

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Speech plays an extremely important role in social activities. Many individuals suffer from a "speech barrier," which limits their communication with others. In this study, an improved speech recognition method is proposed that addresses the needs of speech-impaired and deaf individuals. A basic improved connectionist temporal classification convolutional neural network (CTC-CNN) architecture acoustic model was constructed by combining a speech database with a deep neural network. Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication. The method can be extended to modern artificial intelligence techniques, with multiple applications such as meeting minutes, medical reports, and verbatim records for cars, sales, etc. For experiments, a modified CTC-CNN was used to train an acoustic model, which showed better performance than the earlier common algorithms. Thus a CTC-CNN baseline acoustic model was constructed and optimized, which reduced the error rate to about 18% and improved the accuracy rate.

show abstract

Section: The Connectionist Temporal Classification-convolutional Neur...mentioning

confidence: 99%

Section: Figure 12: Comparison Diagrams Of Applied Normalizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving Speech Enhancement Framework via Deep Learning

Hsiao¹,

Sung²

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Deep learning is predominantly used in visual recognition, speech recognition, natural language processing, biomedicine, and other fields, where it has achieved excellent results. However, in the field of speech recognition, the performance of the acoustic model directly affects the accuracy and stability of the final speech recognition system, such that it is necessary to consider its establishment, optimization, and efficiency in detail [1].…”

Section: Introductionmentioning

confidence: 99%

Speech Recognition via CTC-CNN Model

Sung

Kang

Hsiao

2022

Preprint

View full text Add to dashboard Cite

In modern society, human communication is increasingly frequent, and speech plays an extremely important role in social activities. Words can be used to express emotions and thoughts. However, numerous individuals are troubled by "language barriers", due to which their communication with others is limited. This study proposes a method to address the needs of speech-impaired and deaf-mute individuals. A basic deep neural network (DNN) acoustic model was established through a voice database combined with a deep neural network. A sound sensor was used to convert the collected voice signals and process them into text or corresponding voice signals to improve communication. This method can be extended to modern artificial intelligence technology, with diversified applications, such as verbatim transcripts of meeting minutes, medical reports, automotive, sales, etc. The results obtained in this study demonstrate the efficiency of the proposed method and discuss its significance.

show abstract

Research on Tibetan Speech Recognition Based on the Am-do Dialect

Cited by 2 publications

References 22 publications

Improving Speech Enhancement Framework via Deep Learning

Improving Speech Enhancement Framework via Deep Learning

Speech Recognition via CTC-CNN Model

Contact Info

Product

Resources

About