Prosody conversion from neutral speech to emotional speech

Tao, Jianhua; Kang, Yongguo; Li, Aijun

doi:10.1109/tasl.2006.876113

Cited by 178 publications

(14 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most cited paper before the 2000s [ 9 ] had more than 200 citations. Between 2000 to 2010, the most cited paper [ 10 ] also had more than 200 citations. Finally, the most cited paper [ 11 ] after 2010 also had more than 200 citations, indicating an increased interest in emotional speech synthesis in recent years.…”

Section: Past Studies On Emotional Speech Synthesismentioning

confidence: 99%

“…In the early 2000s, the trend shifted to parametric speech synthesis, with hidden Markov model (HMM)-based synthesis being the most popular (see rows three to eight of Table 1 ). Parametric speech synthesis increased the need for good quality databases (the term good quality here refers to recordings in recording studio environments that have controlled noise levels) with adequate phonetic coverage (between 500 [ 15 , 16 ] to 1500 [ 10 ] sentences and larger corpora with 11 h of neutral speech recording [ 15 ]). ( Neutral this context refers to speech without any emotions.)…”

Section: Past Studies On Emotional Speech Synthesismentioning

confidence: 99%

See 1 more Smart Citation

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

James

Balamurali

Watson

et al. 2023

Sensors

View full text Add to dashboard Cite

A low-resource emotional speech synthesis system for empathetic speech synthesis based on modelling prosody features is presented here. Secondary emotions, identified to be needed for empathetic speech, are modelled and synthesised in this investigation. As secondary emotions are subtle in nature, they are difficult to model compared to primary emotions. This study is one of the few to model secondary emotions in speech as they have not been extensively studied so far. Current speech synthesis research uses large databases and deep learning techniques to develop emotion models. There are many secondary emotions, and hence, developing large databases for each of the secondary emotions is expensive. Hence, this research presents a proof of concept using handcrafted feature extraction and modelling of these features using a low-resource-intensive machine learning approach, thus creating synthetic speech with secondary emotions. Here, a quantitative-model-based transformation is used to shape the emotional speech’s fundamental frequency contour. Speech rate and mean intensity are modelled via rule-based approaches. Using these models, an emotional text-to-speech synthesis system to synthesise five secondary emotions-anxious, apologetic, confident, enthusiastic and worried-is developed. A perception test to evaluate the synthesised emotional speech is also conducted. The participants could identify the correct emotion in a forced response test with a hit rate greater than 65%.

show abstract

Section: Past Studies On Emotional Speech Synthesismentioning

confidence: 99%

Section: Past Studies On Emotional Speech Synthesismentioning

confidence: 99%

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

James

Balamurali

Watson

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Items with a recognition rate of less than 60% were discarded. The limit of 60% was chosen based on Tickle's (2000) recommendation.…”

Section: Recognizable Emotional Utterancesmentioning

confidence: 99%

On English Speakers’ Ability to Communicate Emotion in Mandarin

Jian¹

2015

The Canadian Modern Language Review

View full text Add to dashboard Cite

The ability of Mandarin learners to express emotion in Mandarin has received little attention. This study examines how English L1 users express emotions in Mandarin and how this expression differs from that of Mandarin L1 users. Scenarios were adopted to elicit joy, anger, sadness, fear, and neutrality. Both groups articulated anger, joy, and fear with a high pitch. Both groups also employed high intensity for anger and joy and low intensity for sadness and fear. Learners generally employed larger F0 ranges than native speakers, particularly for anger and fear. Learners articulated level tones with lengthened duration and contour tones with shortened duration, affecting the correctness of the portrayal of emotions. Learners used a similar intensity range for all emotions, whereas native speakers tended to vary the intensity with different emotions. The results have implications for teaching Mandarin as a second language with special reference to prosodic naturalness in expressing emotions.

show abstract

“…Basically, they changed prosody parameters like basic frequency (F0), value and pitch, of neutral speech to make speech emotional. [4] Murtaza Bulut models the prosody parameters of part of speech (POS) to enhance the naturalness of emotional speech. [1] Shinya Mori divides the prosody parameter space into some subspaces, and research the restriction of these subspaces to give speech emotion.…”

Section: Introductionmentioning

confidence: 99%

Emotional speech synthesis by XML file using interactive genetic algorithms

Wang

2009

Proceedings of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation

View full text Add to dashboard Cite

As a technique that can "let computer speak", speech synthesis is drawing more and more attention. Today, much speech synthesis software can synthesize neutral speech naturally and flowingly. However, it is hard to make computers speak with "emotion" as that in our daily life, because of the complexity of emotion model. Interactive Genetic Algorithms which can be acted self-organizingly, adaptively and self-learningly can just resolve the problem of difficulty in modeling emotional speech synthesis. As a result, this paper designs an emotional speech synthesis process, which adjusts the parameters (XML-tags) used to synthesize emotional speech dynamically, using interactive Genetic Algorithms, to optimize the quality of emotional speech. Also, the paper includes an evaluation experiment, which proves the feasibility of the algorithms.

show abstract

Prosody conversion from neutral speech to emotional speech

Cited by 178 publications

References 12 publications

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

Exploring Prosodic Features Modelling for Secondary Emotions Needed for Empathetic Speech Synthesis

On English Speakers’ Ability to Communicate Emotion in Mandarin

Emotional speech synthesis by XML file using interactive genetic algorithms

Contact Info

Product

Resources

About