Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1882
|View full text |Cite
|
Sign up to set email alerts
|

SynthASR: Unlocking Synthetic Data for Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…Synthetic speech generation techniques have recently gained attention in other related fields. Fazel et al [21] use synthetic speech generated with T2S to improve accuracy in ASR. Huang et al [62] use a machine translation technique to generate text to train an ASR language model in a low-resource language.…”
Section: Synthetic Speech Generation For Pronunciation Error Detectionmentioning
confidence: 99%
See 1 more Smart Citation
“…Synthetic speech generation techniques have recently gained attention in other related fields. Fazel et al [21] use synthetic speech generated with T2S to improve accuracy in ASR. Huang et al [62] use a machine translation technique to generate text to train an ASR language model in a low-resource language.…”
Section: Synthetic Speech Generation For Pronunciation Error Detectionmentioning
confidence: 99%
“…The probability of pronunciation errors for all the words in a sentence can then be calculated using the Bayes rule [18]. In this new formulation, we move the complexity to learning the speech generation process that is well suited to the problem of limited speech availability [19][20][21]. The proposed method outperforms the state-of-the-art model [9] in detecting pronunciation errors in AUC metric by 41% from 0.528 to 0.749 on the GUT Isle Corpus of L2 Polish speakers.…”
Section: Introductionmentioning
confidence: 99%
“…Data augmentation using either labelled or unlabelled data [1,2] has been used to alleviate this data scarcity problem. One promising approach is speech data synthesis, which recently contributed to significant progress in domain adaptation [3], medication names recognition [4], accurate numeric sequences transcription [5], low-resource languages [6], etc. Most research in this areas has focused on text-tospeech (TTS) generation [7,8,9,10] and audio augmentation [11,12,13], and their effects on the resulting ASR accuracy.…”
Section: Introductionmentioning
confidence: 99%
“…Speech synthesis modelling techniques are advanced such that it is feasible to achieve almost natural-sounding output if sufficient data is available [1,2]. Specifically, current TTS techniques mostly require high-quality single-speaker recordings with text transcription for at least 2 hours of speech [3].…”
Section: Introductionmentioning
confidence: 99%