Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2023
DOI: 10.18653/v1/2023.acl-long.42
|View full text |Cite
|
Sign up to set email alerts
|

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

Abstract: The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority language… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…This matrix is referred to as the mask. The estimated mask aims to closely resemble the ideal ratio mask (IRM) [13], which is defined in Equation (3).…”
Section: Mask-based Separation Methods In Time Frequency Domainsmentioning
confidence: 99%
See 1 more Smart Citation
“…This matrix is referred to as the mask. The estimated mask aims to closely resemble the ideal ratio mask (IRM) [13], which is defined in Equation (3).…”
Section: Mask-based Separation Methods In Time Frequency Domainsmentioning
confidence: 99%
“…It is not only hampered by acoustic interference by background noise such as traffic, crowd noise, but also speaker variability like accents, dialects, microphone quality and so on. Bartelds et al reveal issues such as the diminishing returns of data augmentation in data-rich environments and the oversight of sociolinguistic factors, which are critical in diverse linguistic contexts [3]. Furthermore, Li et al discuss the challenges faced by ASR systems in handling continuous speech sequences and streaming speech, highlighting the need for more sophisticated models to tackle these issues [4].…”
Section: Introductionmentioning
confidence: 99%