2022
DOI: 10.32604/cmc.2022.027591
|View full text |Cite
|
Sign up to set email alerts
|

Research on Tibetan Speech Recognition Based on the Am-do Dialect

Abstract: In China, Tibetan is usually divided into three major dialects: the Am-do, Khams and Lhasa dialects. The Am-do dialect evolved from ancient Tibetan and is a local variant of modern Tibetan. Although this dialect has its own specific historical and social conditions and development, there have been different degrees of communication with other ethnic groups, but all the abovementioned dialects developed from the same language: Tibetan. This paper uses the particularity of Tibetan suffixes in pronunciation and p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…Therefore, after the speech input, the output sequence with the highest probability is selected, and after CTC decoding optimization, the final recognition result (x) is provided as output, where the operation formula is shown in Eq. (1).…”
Section: The Connectionist Temporal Classification-convolutional Neur...mentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, after the speech input, the output sequence with the highest probability is selected, and after CTC decoding optimization, the final recognition result (x) is provided as output, where the operation formula is shown in Eq. (1).…”
Section: The Connectionist Temporal Classification-convolutional Neur...mentioning
confidence: 99%
“…In this study, deep learning was integrated into the development stage of the CTC-CNN model. The main problems are: (1) training usually must solve a highly nonlinear optimization problem, which may easily lead to many local minima during the training process of the network, and (2) a too-long training time may lead to overfitting results. In practical application, the system is stable, efficient, and general-purpose, and more than 97.5% of the recognition rate of noisy speech can be achieved.…”
Section: Figure 12: Comparison Diagrams Of Applied Normalizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Deep learning is predominantly used in visual recognition, speech recognition, natural language processing, biomedicine, and other fields, where it has achieved excellent results. However, in the field of speech recognition, the performance of the acoustic model directly affects the accuracy and stability of the final speech recognition system, such that it is necessary to consider its establishment, optimization, and efficiency in detail [1].…”
Section: Introductionmentioning
confidence: 99%