Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2730
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Text-to-Speech for Low-Resource Languages by Cross-Lingual Transfer Learning

Abstract: End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data. However, laborious data collection remains difficult for at least 95% of the languages over the world, which hinders the development of TTS in different languages. In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available. We show such TTS can be effectively constructed by transferring knowledge from a high-resource (source) lan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 45 publications
(37 citation statements)
references
References 19 publications
1
36
0
Order By: Relevance
“…ASR-TTS proposed by [30] need additional ASR to assist TTS learning. Mapping-based DTL is explored in [31] by adding a phonetic transformation network (PTN) model to learn a mapping between source and target linguistic symbols. An ASR system is used to train PTN separately.…”
Section: B Low-resource Problemmentioning
confidence: 99%
See 3 more Smart Citations
“…ASR-TTS proposed by [30] need additional ASR to assist TTS learning. Mapping-based DTL is explored in [31] by adding a phonetic transformation network (PTN) model to learn a mapping between source and target linguistic symbols. An ASR system is used to train PTN separately.…”
Section: B Low-resource Problemmentioning
confidence: 99%
“…Our strategy is simpler than [30] as it does not need additional system such as ASR. Similar to [31], we apply DTL approach. However, our DTL is network-based approach that is more flexible than mapping-based DTL applied by [31] in which with multi stages of transfer learning the previous learned DNN parameters can be passed on to a larger network.…”
Section: B Low-resource Problemmentioning
confidence: 99%
See 2 more Smart Citations
“…We randomly select 200 sentences for IR test and 20 sentences for MOS test, following the same test configuration in English. Each audio is listened by at least 5 testers for IR test and 9 The audio samples and complete experiments results on IR and MOS for TTS, and WER and CER for ASR can be founded in https://speechresearch.github.io/lrspeech. 20 testers for MOS test, who are all native Lithuanian speakers.…”
Section: Apply To Truly Low-resource Language: Lithuanianmentioning
confidence: 99%