2015
DOI: 10.1016/j.csl.2015.03.008
|View full text |Cite
|
Sign up to set email alerts
|

Emotion transplantation through adaptation in HMM-based speech synthesis

Abstract: This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 33 publications
(12 citation statements)
references
References 20 publications
0
12
0
Order By: Relevance
“…Their main advantage is the possibility of generating speaker-independent emotional speech models that aggregate information from different speakers and different emotions, providing highly controllable and reliable systems. Some tasks they have proved to been successful on range from emotional intensity control of 2 emotional classes (joyful and sad) (Nose and Kobayashi, 2012) to emotion transplantation of 4 emotional classes (angry, happy, sad and surprised) (Lorenzo-Trueba et al, 2015).…”
Section: Overview Of Emotional Speech Synthesismentioning
confidence: 99%
“…Their main advantage is the possibility of generating speaker-independent emotional speech models that aggregate information from different speakers and different emotions, providing highly controllable and reliable systems. Some tasks they have proved to been successful on range from emotional intensity control of 2 emotional classes (joyful and sad) (Nose and Kobayashi, 2012) to emotion transplantation of 4 emotional classes (angry, happy, sad and surprised) (Lorenzo-Trueba et al, 2015).…”
Section: Overview Of Emotional Speech Synthesismentioning
confidence: 99%
“…speech rate, speech intensity, F0, and F0 range), the HTS-et synthesizer was chosen for the parametric modelling of emotions. In addition to a parametric synthesis of emotional speech, this statistical parametric synthesis method is also suitable for a corpus-based approach; for instance, corpora of emotional speech are used to train speech models (Yamagishi et al 2005, Lorenzo-Trueba et al 2015. We used two synthetic voices trained on neutral speech, a male voice, Tõnu, and a female voice, Eva, as the basic voices of the Estonian HTS synthesis.…”
Section: Estonian Speech Synthesizersmentioning
confidence: 99%
“…Only a few adaptation strategies have been tested extensively by emotion-synthesis researchers. Most of these strategies were based on a hidden Markov model using a constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation [51] or training a speaker-dependent model by adapting from speakerindependent average models [52]. In addition, the method in [52] used a segment of duration as short as 5 min of the emotional data from the target speaker and could produce an appreciable perception of the synthesized emotion, however compromising on the speech quality.…”
Section: Introductionmentioning
confidence: 99%