2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2017
DOI: 10.1109/apsipa.2017.8282231
|View full text |Cite
|
Sign up to set email alerts
|

An investigation to transplant emotional expressions in DNN-based TTS synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
2

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(15 citation statements)
references
References 12 publications
0
13
0
2
Order By: Relevance
“…Using input codes for representing different styles is also presented in [119, 120]. There have also been attempts at style transplantation, i.e., producing speech in the voice of speaker A in style X without having any sentence from speaker A in style X in the training data, in which case the network is forced to learn the style X from other speakers in the training database [121, 122].…”
Section: Progress In Speech Recognition and Synthesis As Well As mentioning
confidence: 99%
“…Using input codes for representing different styles is also presented in [119, 120]. There have also been attempts at style transplantation, i.e., producing speech in the voice of speaker A in style X without having any sentence from speaker A in style X in the training data, in which case the network is forced to learn the style X from other speakers in the training database [121, 122].…”
Section: Progress In Speech Recognition and Synthesis As Well As mentioning
confidence: 99%
“…For generalizing emotional expressions to new speakers, early SPSS researches [13,14] tried to represent emotion as an additive factors. [15] investigated several deep neural network (DNN) architectures for emotion transplantation. In this work they used 3 emotional and 21 neutral speakers.…”
Section: Multi-speaker Essmentioning
confidence: 99%
“…The first works in DNNs-based acoustic speech synthesis appeared in 2013 and used FeedForward DNNs to model the mapping between linguistic and acoustic features [12,29,30,31]. Later, other studies worked on adding expressiveness to the synthesized voice [32,20,33,34]. Regarding audiovisual speech, some works used DNNs to model emotion categories such as [35] and [16] who used FeedForward DNNs to synthesize expressive audiovisual speech.…”
Section: Related Workmentioning
confidence: 99%