Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1113
|View full text |Cite
|
Sign up to set email alerts
|

Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder

Abstract: Recent advances in neural autoregressive models have improve the performance of speech synthesis (SS). However, as they lack the ability to model global characteristics of speech (such as speaker individualities or speaking styles), particularly when these characteristics have not been labeled, making neural autoregressive SS systems more expressive is still an open issue. In this paper, we propose to combine VoiceLoop, an autoregressive SS model, with Variational Autoencoder (VAE). This approach, unlike tradi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
82
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 117 publications
(83 citation statements)
references
References 17 publications
0
82
0
1
Order By: Relevance
“…With regards to research, a number of themes do indeed echo topics within speech interface research. For instance, vocal quality clearly maps to work on speech synthesis, where developing more human-like, expressive [1], emotive [39] and personality-filled [51] voices is currently underway. User research has also focused on exploring the role of humanness in partner knowledge assumptions [15], vocal quality [16,37], partner identity [4], linguistic content [13] and conversational interactivity [13,45].…”
Section: Discussionmentioning
confidence: 99%
“…With regards to research, a number of themes do indeed echo topics within speech interface research. For instance, vocal quality clearly maps to work on speech synthesis, where developing more human-like, expressive [1], emotive [39] and personality-filled [51] voices is currently underway. User research has also focused on exploring the role of humanness in partner knowledge assumptions [15], vocal quality [16,37], partner identity [4], linguistic content [13] and conversational interactivity [13,45].…”
Section: Discussionmentioning
confidence: 99%
“…VAEs have been demonstrated for speech synthesis [18,19], voice conversion [20], and intonation modelling [21,Chapter 7]. Discrete representations have also been incorporated into the VAE framework [22,23].…”
Section: Related Workmentioning
confidence: 99%
“…Work done during internship at Microsoft STC Asia text generation [9], image generation [10,11] and speech generation [12,13] tasks. VAE has many merits, such as learning disentangled factors, smoothly interpolating or continuously sampling between latent representations which can obtain interpretable homotopies [9].…”
Section: Introductionmentioning
confidence: 99%