Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design

Yessenbayev, Zhandos; Karabalayeva, Muslima; Shamayeva, Firuza

doi:10.1109/eurosim.2013.75

Cited by 1 publication

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Regarding the first group, Yessenbayev et al [14] conducted a comprehensive study to address the challenge of automatic, speaker-independent recognition of continuous Kazakh speech on a specific vocabulary basis in the presence of noise. According to the author, the proposed system achieved successful results in tasks such as phonetic recognition of English speech and recognition of continuous Kazakh speech, with a relative improvement in the recognition quality of up to 20%.…”

Section: Xlsr-53mentioning

confidence: 99%

Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Kozhirbayev

2023

JAIT

View full text Add to dashboard Cite

In recent years, the progress made in neural models trained on extensive multilingual text or speech data has shown great potential for improving the status of underresourced languages. This paper focuses on experimenting with three state-of-the-art speech recognition models, namely Facebook's Wav2Vec2.0 and Wav2Vec2-XLS-R, OpenAI's Whisper, on the Kazakh language. The objective of this research is to investigate the effectiveness of these models in transcribing Kazakh speech and to compare their performance with existing supervised Automatic Speech Recognition (ASR) systems. The study also aims to explore the possibility of using data from other languages for pre-training and to test whether fine-tuning the target language data can improve model performance. Thus, this work can provide insights into the effectiveness of using pretrained multilingual models in underresourced language settings. The wav2vec2.0 model achieved a Character Error Rate (CER) of 2.8 and a Word Error Rate (WER) of 8.7 on the test set, which closely matches the best result achieved by the end-to-end Transformer model. The large whisper model achieves a CER of approximately 4 on the test set. The results of this study can contribute to the development of robust and efficient ASR systems for the Kazakh language, benefiting various applications, including speech-to-text translation, voice assistants, and speech-based communication tools.

show abstract

Section: Xlsr-53mentioning

confidence: 99%

Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Kozhirbayev

2023

JAIT

View full text Add to dashboard Cite

show abstract

Towards Building an Intelligent Voice System for Kazakh: Acoustic Database and System Design

Cited by 1 publication

References 11 publications

Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Contact Info

Product

Resources

About