2022
DOI: 10.1038/s41598-022-12260-y
|View full text |Cite
|
Sign up to set email alerts
|

A study of transformer-based end-to-end speech recognition system for Kazakh language

Abstract: Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition. The great advantage of this architecture is the fast learning speed, and the lack of sequential operation, as with recurrent neural networks. In this work, Transformer models and an end-to-end model based on connectionist temporal classification were considered to build a system for automatic recognition of Kazakh speech. It is known that Kazakh is part of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(7 citation statements)
references
References 21 publications
0
6
0
1
Order By: Relevance
“…The model is built on an encoder-decoder architecture, each consisting of multiple layers. Notably, the Transformer eliminates the need for recurrence, which is a staple in traditional models such as recurrent neural networks, thereby facilitating enhanced parallelization ( Orken et al, 2022 ). The architecture incorporates a multi-head self-attention mechanism and a position-wise fully connected feed-forward network in both the encoder and decoder layers.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The model is built on an encoder-decoder architecture, each consisting of multiple layers. Notably, the Transformer eliminates the need for recurrence, which is a staple in traditional models such as recurrent neural networks, thereby facilitating enhanced parallelization ( Orken et al, 2022 ). The architecture incorporates a multi-head self-attention mechanism and a position-wise fully connected feed-forward network in both the encoder and decoder layers.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…In 2016, Abilhayer et al [141] constructed a continuous speech recognition system based on the GMM-HMM. Since 2019, the Institute of Information and Computational Technology and al-Farabi Kazakh National University published several papers on Kazakh speech recognition and verified the DNN-HMM system [142,143], BLSTM-CTC end-to-end system [144,145], and Transformer CTC/attention system [146] with their private data. At the same time, Beibut et al [147] constructed an LSTM-CTC end-to-end Kazakh ASR system based on transfer learning.…”
Section: Ksc/ksc2mentioning
confidence: 96%
“…документации; • голосовое управление (использование команд); • интерактивное взаимодействие с пациентом. В Казахстане были произведены испытания системы распознавания речи Trasformer, которая показала хорошие результаты (коэффициент ошибок 3.7%) [3]. Автоматическое распознавание речи широко используется на английском, китайском, японском и французском языках.…”
Section: перечень сокращений и обозначенийunclassified