9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-4
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis

Abstract: The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polarity-scores (positive/negative polarity) provided by a less fine-grained sentiment analysis tool, in order to get more … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…There is no prior knowledge available on the number of clusters in students' presentations. However, a study conducted on audio books data suggests that there are 50 different clusters (types of spoken expressions, such as spoken emotion or voice style) in those data (Vanmassenhove et al, 2016). Therefore, we used different number of clusters (m = 5, 10, 15, ..., 100) for generating the audio data representation.…”
Section: Active Audio Data Representationmentioning
confidence: 99%
“…There is no prior knowledge available on the number of clusters in students' presentations. However, a study conducted on audio books data suggests that there are 50 different clusters (types of spoken expressions, such as spoken emotion or voice style) in those data (Vanmassenhove et al, 2016). Therefore, we used different number of clusters (m = 5, 10, 15, ..., 100) for generating the audio data representation.…”
Section: Active Audio Data Representationmentioning
confidence: 99%
“…SOM is an attractive clustering method in this context as it addresses both topology and distribution, and requires no assumptions regarding the input vectors. Furthermore, it has been previously used for speech segment clustering based on voice styles with good results [30], [31]. Here m represents the number of SOM clusters that correspond to the FEM.…”
Section: Feature Extraction Model (Fem)mentioning
confidence: 99%
“…For instance, Trilla and Alias (2013) already used sentiment analysis on sentence level for an expressive TTS. Vanmassenhove et al (2016) also used sentiment combined with emotion labels for an HMM-based system. Sudhakar and Bensraj (2014) implemented a TTS in Matlab which used sentiment information trained with fuzzy neural networks evaluated in a news domain.…”
Section: Nn-based Expressive Speech Synthesis With Sentiment Embeddingsmentioning
confidence: 99%