2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018
DOI: 10.23919/apsipa.2018.8659599
|View full text |Cite
|
Sign up to set email alerts
|

A DNN-based emotional speech synthesis by speaker adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 15 publications
0
7
0
Order By: Relevance
“…RMSE is an established metric used [33], [34], [37], [49], [50], [52], [57] to evaluate the proximity of predicted values by a mapping algorithm to those of the target, given as…”
Section: ) Root Mean Square Errormentioning
confidence: 99%
See 1 more Smart Citation
“…RMSE is an established metric used [33], [34], [37], [49], [50], [52], [57] to evaluate the proximity of predicted values by a mapping algorithm to those of the target, given as…”
Section: ) Root Mean Square Errormentioning
confidence: 99%
“…Most of these strategies were based on a hidden Markov model using a constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation [51] or training a speaker-dependent model by adapting from speakerindependent average models [52]. In addition, the method in [52] used a segment of duration as short as 5 min of the emotional data from the target speaker and could produce an appreciable perception of the synthesized emotion, however compromising on the speech quality. CSMAPLR considers the linguistic information from the regression tree, thereby distinguishing it from other adaptation approaches.…”
Section: Introductionmentioning
confidence: 99%
“…By using a speaker adaptation method, Yang et. al [13] proposed a method that generated emotional speech from a small amount of emotional speech training data. In all aforementioned methods, the target speaker's emotional speech is necessary for training.…”
Section: Introductionmentioning
confidence: 99%
“…This works even if the target speaker's emotional speech is not included in training data. A key idea is to explicitly control the speaker factor and the emotional factor, motivated by the success in the multi-speaker model [6,7,8,9,10] and multi-emotional model [11,12,13]. Once the factors are trained, by independently controlling the factors, we can synthesize speech with any combination of a speaker and an emotion.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, research has not only improved the quality so that a synthesized voice is similar to a real one, but also has diversified expressions (Govind and Prasanna, 2013;Kaur and Singh, 2015;Skerry-Ryan et al, 2018). Emotional speech synthesis is a technique for diversifying the expression of speech synthesis (Qin et al, 2006;Schröder, 2001;Yang et al, 2018). It specifies both emotional parameters and the text input so that the speech reflects the designated emotion (Charfuelan and Steiner, 2013;Inoue et al, 2017;Iwata and Kobayashi, 2011;Nose and Kobayashi, 2013).…”
Section: Introductionmentioning
confidence: 99%