2019 16th International Multi-Conference on Systems, Signals &Amp; Devices (SSD) 2019
DOI: 10.1109/ssd.2019.8893184
|View full text |Cite
|
Sign up to set email alerts
|

Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 5 publications
0
15
0
1
Order By: Relevance
“…Selftraining, or sequnece-level knowledge distillation by text-to-text machine translation model, is the most effective way to utilize the huge ASR training data . On the other hand, synthesizing data by text-to-speech (TTS) has been demonstrated to be effective for low resource speech recognition task (Gokay and Yalcin, 2019;Ren et al, 2019). To the best of our knowledge, this is the first work to augment data by TTS for simultaneous speech-to-text translation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Selftraining, or sequnece-level knowledge distillation by text-to-text machine translation model, is the most effective way to utilize the huge ASR training data . On the other hand, synthesizing data by text-to-speech (TTS) has been demonstrated to be effective for low resource speech recognition task (Gokay and Yalcin, 2019;Ren et al, 2019). To the best of our knowledge, this is the first work to augment data by TTS for simultaneous speech-to-text translation tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Data from other languages were additionally used in [64]. Acoustic data perturbation and speech synthesis were combined in [65], resulting in a 14.8% relative WER improvement. Semisupervised training refers to train the model with supervised data and unsupervised data through the confidence threshold.…”
Section: A.data Augmentationmentioning
confidence: 99%
“…Due to the change of the signal length, the GMM-HMM system was used to realign the data after the speed perturbation [70]. Gokay et al [65] used speed perturbation, volume perturbation and a combination of the two for data augmentation. Kanda et al [72] studied three distortion methods of vocal tract length distortion, speech rate distortion, and frequency-axis random distortion.…”
Section: ) Acoustic Data Perturbationmentioning
confidence: 99%
See 1 more Smart Citation
“…In understanding grammatical ambiguous sentences, the system requires different time to process each sentence; the processing of this sentence depends on the number of characters understood [34]. The average sentence search value is 0.003275.…”
Section: The Speed In Detecting Ambiguous Wordsmentioning
confidence: 99%