Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-770
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation for Low-Resource Quechua ASR Improvement

Abstract: Automatic Speech Recognition (ASR) is a key element in new services that helps users to interact with an automated system. Deep learning methods have made it possible to deploy systems with word error rates below 5% for ASR of English. However, the use of these methods is only available for languages with hundreds or thousands of hours of audio and their corresponding transcriptions. For the so-called low-resource languages to speed up the availability of resources that can improve the performance of their ASR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Data augmentation using either labelled or unlabelled data [1,2] has been used to alleviate this data scarcity problem. One promising approach is speech data synthesis, which recently contributed to significant progress in domain adaptation [3], medication names recognition [4], accurate numeric sequences transcription [5], low-resource languages [6], etc. Most research in this areas has focused on text-tospeech (TTS) generation [7,8,9,10] and audio augmentation [11,12,13], and their effects on the resulting ASR accuracy.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Data augmentation using either labelled or unlabelled data [1,2] has been used to alleviate this data scarcity problem. One promising approach is speech data synthesis, which recently contributed to significant progress in domain adaptation [3], medication names recognition [4], accurate numeric sequences transcription [5], low-resource languages [6], etc. Most research in this areas has focused on text-tospeech (TTS) generation [7,8,9,10] and audio augmentation [11,12,13], and their effects on the resulting ASR accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Zevallos et. al. shows a good improvement from the non-augmented dataset [6]. In [6], synthetic utterances were created by first replac-*Equal contribution.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation