A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis

Khomitsevich, Olga; Mendelev, Valentin; Tomashenko, Natalia; Rybin, Sergey V.; Medennikov, Ivan; Kudubayeva, Saule

doi:10.1007/978-3-319-23132-7_3

Cited by 19 publications

(9 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, Mamyrbayev et al (2019) collected 76 hours of data using a professional recording booth which were further extended to 123 hours in Mamyrbayev et al (2020). Khomitsevich et al (2015) utilized 147 hours of bilingual Kazakh-Russian speech data to build code-switching ASR systems. Shi et al (2017) released 78 hours of transcribed Kazakh speech data recorded by 96 students from China.…”

Section: Related Workmentioning

confidence: 99%

“…Additionally, we filtered out texts entirely consisting of Russian words. Texts consisting of mixed Kazakh-Russian utterances were kept, because there are many borrowed Russian words in Kazakh, and it is common practice among Kazakh speakers to code-switch between Kazakh and Russian (Khomitsevich et al, 2015). Next, we split the texts into sentences and removed sentences consisting of more than 25 words.…”

Section: Text Collection and Cleaningmentioning

confidence: 99%

See 1 more Smart Citation

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Khassanov¹,

Mussakhojayeva²,

Mirzakhmetov³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and preprocessing procedures followed by a description of the database specifications. We also share our experience and challenges faced during the database construction, which might benefit other researchers planning to build a speech corpus for a low-resource language. To demonstrate the reliability of the database, we performed preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts is promising (2.8% character error rate and 8.7% word error rate on the test set). To enable experiment reproducibility and ease the corpus usage, we also released an ESPnet recipe for our speech recognition models.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Text Collection and Cleaningmentioning

confidence: 99%

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Khassanov¹,

Mussakhojayeva²,

Mirzakhmetov³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…Similarly, the authors of [26] presented the first publicly available speech synthesis dataset for Kazakh. Previously, the Kazakh language was part of several multilingual studies under the IARPA's Babel project [9,16,18], and it was also explored in the context of Kazakh-Russian [21,33] and Kazakh-English [5] code-switching.…”

Section: Related Workmentioning

confidence: 99%

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

Mussakhojayeva

Khassanov

Varol

2021

Preprint

View full text Add to dashboard Cite

We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and independent. Furthermore, we evaluate the impact of LMs and data augmentation techniques on the recognition performance of the multilingual E2E ASR. In addition, we present several datasets for training and evaluation purposes. Experiment results show that the multilingual models achieve comparable performances to the monolingual baselines with a similar number of parameters. Our best monolingual and multilingual models achieved 20.9% and 20.5% average word error rates on the combined test set, respectively. To ensure the reproducibility of our experiments and results, we share our training recipes, datasets, and pre-trained models 1 .

show abstract

“…Similarly, Mamyrbayev et al (2019) collected 76 hours of data using professional recording booth which was further extended to 123 hours in (Mamyrbayev et al, 2020). Khomitsevich et al (2015) utilized 147 hours of bilingual Kazakh-Russian speech corpus to build code-switching ASR systems. Shi et al (2017) released 78 hours of transcribed Kazakh speech data recorded by 96 students from China.…”

Section: Related Workmentioning

confidence: 99%

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Khassanov¹,

Mussakhojayeva²,

Mirzakhmetov³

et al. 2020

Preprint

View full text Add to dashboard Cite

We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 335 hours of transcribed audio comprising over 154,000 utterances spoken by participants from different regions, age groups, and gender. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database developed to advance various Kazakh speech and language processing applications. In this paper, we first describe the data collection and prepossessing procedures followed by the description of the database specifications. We also share our experience and challenges faced during database construction. To demonstrate the reliability of the database, we performed the preliminary speech recognition experiments. The experimental results imply that the quality of audio and transcripts are promising. To enable experiment reproducibility and ease the corpus usage, we also released the ESPnet recipe.2 https://issai.nu.edu.kz/ kz-speech-corpus/

show abstract

A Bilingual Kazakh-Russian System for Automatic Speech Recognition and Synthesis

Cited by 19 publications

References 8 publications

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

Contact Info

Product

Resources

About