2022 IEEE Spoken Language Technology Workshop (SLT) 2023
DOI: 10.1109/slt54892.2023.10022793
|View full text |Cite
|
Sign up to set email alerts
|

On Granularity of Prosodic Representations in Expressive Text-to-Speech

Abstract: In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training. Same text may correspond to various acoustic realizations, which is known as a one-to-many mapping problem in text-to-speech. Utterance, word, or phoneme-level representations are extracted from target signal in an auto-encoding setup, to complement phonetic input and simplify that mapping. This paper compares prosodic embeddings at different levels of granularity and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 37 publications
0
0
0
Order By: Relevance