2021
DOI: 10.48550/arxiv.2106.10229
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A learned conditional prior for the VAE acoustic space of a TTS system

Abstract: Many factors influence speech yielding different renditions of a given sentence. Generative models, such as variational autoencoders (VAEs), capture this variability and allow multiple renditions of the same sentence via sampling. The degree of prosodic variability depends heavily on the prior that is used when sampling. In this paper, we propose a novel method to compute an informative prior for the VAE latent space of a neural textto-speech (TTS) system. By doing so, we aim to sample with more prosodic varia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…For a targeted generation of materials with specific properties, we operate the VAE as a conditional model. 71,72 Such class-conditional learning can easily be incorporated while keeping a similar training procedure. To this end, both the encoder and the decoder receive the class label of each training sample, which is concatenated with their respective inputs.…”
Section: Generative Frameworkmentioning
confidence: 99%
“…For a targeted generation of materials with specific properties, we operate the VAE as a conditional model. 71,72 Such class-conditional learning can easily be incorporated while keeping a similar training procedure. To this end, both the encoder and the decoder receive the class label of each training sample, which is concatenated with their respective inputs.…”
Section: Generative Frameworkmentioning
confidence: 99%