Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1411
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 24 publications
(22 citation statements)
references
References 22 publications
0
22
0
Order By: Relevance
“…This is perhaps an aspect of a TTS voice that can easily be improved. Modern technologies already make an attempt at producing a trade-off between naturalness and variability that is similar to human speech (Tyagi et al, 2019). Human speech varies due to contextual as well as other more arbitrary factors such as prosody dynamics.…”
Section: Qualitative Analysesmentioning
confidence: 99%
“…This is perhaps an aspect of a TTS voice that can easily be improved. Modern technologies already make an attempt at producing a trade-off between naturalness and variability that is similar to human speech (Tyagi et al, 2019). Human speech varies due to contextual as well as other more arbitrary factors such as prosody dynamics.…”
Section: Qualitative Analysesmentioning
confidence: 99%
“…TP-GST [21] predicts embeddings, but only using segmental information. Tyagi et al [17] use additional context information, but don't train a prediction model. We argue that additional context information is vital to improving prosody quality in TTS.…”
Section: Introductionmentioning
confidence: 99%
“…Our TTS system is based on a Tacotron-like [2] structure with an additional variational auto-encoder (VAE) [26] to capture prosody [27,28] (Fig. 2).…”
Section: Architecturementioning
confidence: 99%