2019
DOI: 10.1051/itmconf/20192401012
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Speech Recognition of Kazakh Language

Abstract: This article describes the methods of creating a system of recognizing the continuous speech of Kazakh language. Studies on recognition of Kazakh speech in comparison with other languages began relatively recently, that is after obtaining independence of the country, and belongs to low resource languages. A large amount of data is required to create a reliable system and evaluate it accurately. A database has been created for the Kazakh language, consisting of a speech signal and corresponding transcriptions. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 2 publications
0
5
0
Order By: Relevance
“…Text files have been normalized by removing all the unnecessary characters and representing in lower case. The overall duration of our speech corpus, slightly inferior compared to corpuses that has been described here [25,26], where they have biggest speech corpus for Kazakh language. They have collected around 30 hours of data with 200 different speakers with different genders and ages.…”
Section: Datasets and Preprocessingmentioning
confidence: 83%
“…Text files have been normalized by removing all the unnecessary characters and representing in lower case. The overall duration of our speech corpus, slightly inferior compared to corpuses that has been described here [25,26], where they have biggest speech corpus for Kazakh language. They have collected around 30 hours of data with 200 different speakers with different genders and ages.…”
Section: Datasets and Preprocessingmentioning
confidence: 83%
“…In 2016, Abilhayer et al [141] constructed a continuous speech recognition system based on the GMM-HMM. Since 2019, the Institute of Information and Computational Technology and al-Farabi Kazakh National University published several papers on Kazakh speech recognition and verified the DNN-HMM system [142,143], BLSTM-CTC end-to-end system [144,145], and Transformer CTC/attention system [146] with their private data. At the same time, Beibut et al [147] constructed an LSTM-CTC end-to-end Kazakh ASR system based on transfer learning.…”
Section: Ksc/ksc2mentioning
confidence: 96%
“…Early databases generally contained reading speech recorded by microphones in an office environment (even high-quality speech recorded in a sound-proof studio [142,143]). Speech data recorded in this way are characterized by clear pronunciation, small noise interference, and a single channel.…”
Section: The Diversity Of Data Sourcesmentioning
confidence: 99%
“…In this work, we use such CRF-based NER system as one baseline and make comparison to our deep learning models. Recently, deep learning models including biLSTM have obtained a significant success on various natural languages processing tasks, such as POS tagging [28,13,26,25], NER [4,10], machine translation [2,8], word segmentation [10] and on other fields like speech recognition [15,15,7,16,1]. As the state-of-the-art of NER, in the study [12], the authors have explored various neural architectures for NER including the language independent character-based biLSTM-CRF models.…”
Section: Related Workmentioning
confidence: 99%