2020
DOI: 10.1016/j.cogsys.2019.09.009
|View full text |Cite
|
Sign up to set email alerts
|

Building a controllable expressive speech synthesis system with multiple emotion strengths

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 14 publications
0
11
0
1
Order By: Relevance
“…Naturally, there have been several attempts to obtain emotion strength information without manual works. Zhu and Xue [48] applied a K-means clustering algorithm to partition emotion strength levels for a speech corpus and developed an embedding vector that continuously represents the emotion strength using a t-distributed stochastic neighbor embedding (t-SNE) algorithm [49]. Zhu et al [50] also applied the concept of a relative attribute to learn a ranking function for each emotion category and controlled the emotion strength continuously.…”
Section: Emotion Strength Control Modellingmentioning
confidence: 99%
“…Naturally, there have been several attempts to obtain emotion strength information without manual works. Zhu and Xue [48] applied a K-means clustering algorithm to partition emotion strength levels for a speech corpus and developed an embedding vector that continuously represents the emotion strength using a t-distributed stochastic neighbor embedding (t-SNE) algorithm [49]. Zhu et al [50] also applied the concept of a relative attribute to learn a ranking function for each emotion category and controlled the emotion strength continuously.…”
Section: Emotion Strength Control Modellingmentioning
confidence: 99%
“…The first works in DNNs-based acoustic speech synthesis appeared in 2013 and used FeedForward DNNs to model the mapping between linguistic and acoustic features [12,29,30,31]. Later, other studies worked on adding expressiveness to the synthesized voice [32,20,33,34]. Regarding audiovisual speech, some works used DNNs to model emotion categories such as [35] and [16] who used FeedForward DNNs to synthesize expressive audiovisual speech.…”
Section: Related Workmentioning
confidence: 99%
“…Henter et al [39] and Zhu et al [34] succeeded in creating nuances of emotions without using emotion degree annotations, nevertheless, this work still relies on emotion labels as input.…”
Section: Related Workmentioning
confidence: 99%
“…Cilj rada je teorijska analiza i poređenje dvije inovativne metode koje teže da modeluju kompleksnost emocija u vidu kontinualnih vektora kojima je moguće manipulisati. U prvom pristupu [5], korišćenjem kontrolnog embedding vektora kao dodatnog ulaza LSTM (eng. Long Short-Term Memory -LSTM) duboke neuronske mreže (eng.…”
Section: Uvodunclassified