2018
DOI: 10.48550/arxiv.1809.10460
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sample Efficient Adaptive Text-to-Speech

Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

1
56
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 32 publications
(57 citation statements)
references
References 26 publications
1
56
0
Order By: Relevance
“…SEA-TTS [9] uses Wavenet [1] as its basic architecture and compares the two voice cloning approaches. As mentioned by [8] that adapting the whole model might result in overfitting, SEA-TTS proposed two techniques to deal with it.…”
Section: A Experimental Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…SEA-TTS [9] uses Wavenet [1] as its basic architecture and compares the two voice cloning approaches. As mentioned by [8] that adapting the whole model might result in overfitting, SEA-TTS proposed two techniques to deal with it.…”
Section: A Experimental Resultsmentioning
confidence: 99%
“…There are two general approaches to deal with such task [8], speaker adaptation [8], [9], [10], [11], [12] and speaker encoding [6], [13], [7], [14], [15], [16], [17]. The speaker encoding method builds a multi-speaker TTS architecture which consists of a speaker encoder and a TTS model.…”
mentioning
confidence: 99%
See 3 more Smart Citations