Development of High-Performance and Large-Scale Vietnamese Automatic Speech Recognition Systems

Truong, Do Quoc; Phuong, Pham Ngoc; Tung, Tran Hoang; Mai, Luong Chi

doi:10.15625/1813-9663/34/4/13165

Cited by 2 publications

(7 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 1. The differences among tones in Vietnamese are illustrated in Table 2 [7]. The monosyllabic nature of Vietnamese, combined with its tonal system, adds a layer of complexity to the language.…”

Section: Characteristics Of Vietnamese Languagementioning

confidence: 99%

“…All audio files were converted to the wave format with a sampling frequency of 16 kHz and PCM 16 bits. In [7], three Vietnamese speech corpora have been introduced. Those corpora include two small reading speech corpora with a total of 6 h and 6.5 h, respectively, and a large-scale speech corpus with 900 h. The large-scale speech corpus was collected by crawling untranscripted audio from various resources, such as movies, YouTube movies, and electronic newspapers.…”

Section: Previous Work On Vietnamese Speech Corpusmentioning

confidence: 99%

“…In the past twenty years, there have been a lot of efforts to increase the number of Vietnamese speech corpora, but the number of large-scale Vietnamese speech corpora has been limited. Most public corpora of Vietnamese speech are reading speech, like the FTRI corpus and the corpora in [2,3,8], or part of reading speech, such as VinBigdata-VLSP2020 or two small corpora in [7]. Therefore, the available corpora might not be compatible with real-life scenarios for spoken language, like conversational and discussion recognition.…”

Section: Previous Work On Vietnamese Speech Corpusmentioning

confidence: 99%

“…Many current Vietnamese corpora have a small size, around a few hours to tens of hours, such as VIVOS, VLSP 2018, etc., [2,3,5]. The Vietnamese corpora of more than 100 h, such as VinBigdata-VLSP2020 and corpora in [6,7], are rare, and most of them are not open-access, like the corpus collected by FPT Technology Research Institute (FTRI), namely FTRI corpus, MICA VNSpeechCorpus [8], and the corpora in [6,7]. However, those corpora are not either high-quality sound or open-access.…”

Section: Introductionmentioning

confidence: 99%

“…Most previous works for Vietnamese automatic speech recognition (ASR) tasks investigated traditional statistical ASR models like HMM/GMM, deep neural networks (DNNs) or hybrid models-a combination of HMM/GMM and DNN models [3,6,7,[9][10][11]. Recently, end-to-end (E2E) models in the field of automatic speech recognition have gained considerable attention from both academic and industrial perspectives [12][13][14][15][16][17].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus

Tran,

Kim,

et al. 2024

Electronics

View full text Add to dashboard Cite

Vietnamese is an under-resourced language. The requirement for a large-scale and high-quality Vietnamese speech corpus increases on demand. We introduce a new large-scale Vietnamese speech corpus with 100.5 h collected from various audio sources in the Internet. The raw collected audio was processed to obtain clean speech. Transcription of the clean speech was made manually. The new corpus was analyzed in terms of gender, topic and regional dialect. Results shows that the new corpus has good diversity of genders, topics and regional dialects. We also evaluated the new corpus using state-of-the-art automatic speech recognition models like LAS and Speech-Transformer for multiple scenarios. This is the first time that these models have been applied to Vietnamese speech recognition and obtained reasonable results. Simulation results showed that the new corpus would be a good dataset for the Vietnamese ASR tasks because it reflected correctly difficulties in recognizing speech from different dialects and topic domains.

show abstract

Section: Characteristics Of Vietnamese Languagementioning

confidence: 99%

Section: Previous Work On Vietnamese Speech Corpusmentioning

confidence: 99%