2021
DOI: 10.1007/978-3-030-87802-3_29
|View full text |Cite
|
Sign up to set email alerts
|

Synthesis Speech Based Data Augmentation for Low Resource Children ASR

Abstract: Successful speech recognition for children requires large training data with sufficient speaker variability. The collection of such a training database of children's voices is challenging and very expensive for zero/low resource language like Punjabi. In this paper, the data scarcity issue of the low resourced language Punjabi is addressed through two levels of augmentation. The original training corpus is first augmented by modifying the prosody parameters for pitch and speaking rate. Our results show that th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…Most of these approaches consist of various data augmentation techniques for increasing the amount of usable training data. Text-to-Speech based data augmentations as introduced by [14] and [17], where ASR models are finetuned using synthetic data, have not shown significant increases in the accuracy of child ASR. Generative Adversarial Network (GAN) based augmentation [18], [19], [20] has also been explored to increase the amount of labeled data with acoustic attributes like those of child speech.…”
Section: A Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most of these approaches consist of various data augmentation techniques for increasing the amount of usable training data. Text-to-Speech based data augmentations as introduced by [14] and [17], where ASR models are finetuned using synthetic data, have not shown significant increases in the accuracy of child ASR. Generative Adversarial Network (GAN) based augmentation [18], [19], [20] has also been explored to increase the amount of labeled data with acoustic attributes like those of child speech.…”
Section: A Related Workmentioning
confidence: 99%
“…ASR is an important and useful tool for speech researchers. It forms the basis of speech understanding [11] when combined with advanced language models, but also finds applications in generative models and for training improved Text-To-Speech (TTS) models [12], [13], [14]. The interrelationship between ASR and TTS is further described in [15].…”
Section: Introductionmentioning
confidence: 99%
“…It allows to synthesise speech for arbitrary sentences and therefore to quickly adapt an ASR system to new commands and domains and a single model can handle any number of speakers. TTSbased data augmentation has already been applied to ASR for low-resource languages and children's speech [8]. ASR and TTS are also naturally linked, corresponding to speech perception and speech production, and joint training in a speech chain has been proposed [9].…”
Section: Introductionmentioning
confidence: 99%