2019
DOI: 10.1007/978-3-030-29516-5_5
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Transfer Learning for Low Resource Emotional TTS

Abstract: During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learningbased algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 32 publications
(30 citation statements)
references
References 17 publications
0
30
0
Order By: Relevance
“…The rationale behind this idea is that fine-tuning guides the model to focus on the space that matters the most. Unlike many existing low-resource TTS fine-tuning techniques [7,10,13], the target data is here already present in the so called pre-training step, making our fine-tuning step more of a refinement step.…”
Section: Fine-tuningmentioning
confidence: 99%
See 1 more Smart Citation
“…The rationale behind this idea is that fine-tuning guides the model to focus on the space that matters the most. Unlike many existing low-resource TTS fine-tuning techniques [7,10,13], the target data is here already present in the so called pre-training step, making our fine-tuning step more of a refinement step.…”
Section: Fine-tuningmentioning
confidence: 99%
“…Research that focuses on low-resource TTS tries to mitigate the effects of limited data via multi-speaker modelling and transfer learning [7][8][9][10][11][12][13]. By transferring knowledge gained from high-resource speakers, the quality of lowresource systems improves.…”
Section: Introductionmentioning
confidence: 99%
“…Its operations should be inspired by the features to model (1D convolution or RNN cells for long The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach term context, attention mechanism for recursive relationships). It should have a way to control expressiveness either with a categorical representation [23] or a continuous representation [24]. But it is important to take into account that annotations should not be acquired from humans by asking them to give absolute values on subjective concepts, but rather by asking them to compare examples.…”
Section: Summary and Applicationmentioning
confidence: 99%
“…In this study [32], we aim to find out whether it is possible to obtain an emotional TTS system by fine-tuning a neutral TTS system with a small emotional speech dataset. We study the impact of this fine-tuning on the intelligibility of generated speech and the subjective perception of the generated speech.…”
Section: A Synthesis With Emotion Adaptationmentioning
confidence: 99%