Prediction of Emotion Change From Speech

Huang, Zhaocheng; Epps, Julien

doi:10.3389/fict.2018.00011

Cited by 8 publications

(7 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other related studies have formulated speech emotion recognition problems as detection of changes in the emo-tional content [130], [131] and detection of deviations from neutral patterns [106], [132].…”

Section: Speechmentioning

confidence: 99%

The Ordinal Nature of Emotions: An Emerging Approach

Yannakakis

Cowie

Busso

2021

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Computational representation of everyday emotional states is a challenging task and, arguably, one of the most fundamental for affective computing. Standard practice in emotion annotation is to ask people to assign a value of intensity or a class value to each emotional behavior they observe. Psychological theories and evidence from multiple disciplines including neuroscience, economics and artificial intelligence, however, suggest that the task of assigning reference-based values to subjective notions is better aligned with the underlying representations.This paper draws together the theoretical reasons to favor ordinal labels for representing and annotating emotion, reviewing the literature across several disciplines. We go on to discuss good and bad practices of treating ordinal and other forms of annotation data and make the case for preference learning methods as the appropriate approach for treating ordinal labels. We finally discuss the advantages of ordinal annotation with respect to both reliability and validity through a number of case studies in affective computing, and address common objections to the use of ordinal data. More broadly, the thesis that emotions are by nature ordinal is supported by both theoretical arguments and evidence, and opens new horizons for the way emotions are viewed, represented and analyzed computationally.

show abstract

Section: Speechmentioning

confidence: 99%

The Ordinal Nature of Emotions: An Emerging Approach

Yannakakis

Cowie

Busso

2021

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

show abstract

“…Huang et al [ 9 ] focused on insight into emotion changes instead of analyzing a single speech file. They detected the instant of emotion change using GMM based method on the IEMOCAP database.…”

Section: Discussionmentioning

confidence: 99%

“…There are two key points that impact the performance of speech emotion recognition [ 4 , 5 , 6 , 7 , 8 ]: The first is speech feature selection.Because there are many kinds of features that can be extracted from a speech sample, it is difficult to know which one should be chosen as the most suitable for emotion recognition. Some work [ 1 , 2 , 4 , 5 , 9 , 10 , 11 ] shows that prosody features (i.e., pitch, energy, Zero crossing rate) are important, other work [ 4 , 5 , 8 , 9 , 10 ] shows that quality features (i.e., Formant Frequencies, Spectral features, etc.) are helpful for speech emotion recognition.…”

Section: Introductionmentioning

confidence: 99%

“… The second point affecting emotion recognition accuracy is what kind of classification method is used to do the recognition. Classification methods include: SVM based approach [ 4 , 5 , 7 , 13 , 14 , 15 ], GMM based approach [ 9 ], ANN approach [ 6 , 16 ], RNN [ 1 ] and BayesNet based approach [ 4 , 15 ]. Although some deep models such as RNN have been applied, most of work in this filed use shallow classifiers, which can not detect the deep features in speech signals.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

Zhu

Chen

Zhao

et al. 2017

Sensors

111

View full text Add to dashboard Cite

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed.

show abstract

“…There is a growing interest in developing systems that are dynamic in nature, where the emotions are tracked continuously over time detecting salient segments that deviate from neutral behaviors [7]. Some studies have focused on detecting points where the emotional content change during a dialog [8]. These research directions are appealing from an application perspective.…”

Section: Introductionmentioning

confidence: 99%

Defining Emotionally Salient Regions Using Qualitative Agreement Method

Parthasarathy¹,

Busso²

2016

Interspeech 2016

View full text Add to dashboard Cite

Conventional emotion classification methods focus on predefined segments such as sentences or speaking turns that are labeled and classified at the segment level. However, the emotional state dynamically fluctuates during human interactions, so not all the segments have the same relevance. We are interested in detecting regions within the interaction where the emotions are particularly salient, which we refer to as emotional hotspots. A system with this capability can have real applications in many domains. A key step towards building such a system is to define reliable hotspot labels, which will dictate the performance of machine learning algorithms. Creating groundtruth labels from scratch is both expensive and time consuming. This paper also demonstrates that defining those emotionally salient segments using perceptual evaluation is a hard problem resulting in low inter-evaluator agreement. Instead, we propose to define emotionally salient regions leveraging existing time-continuous emotional labels. The proposed approach relies on the qualitative agreement (QA) method, which dynamically captures increasing or decreasing trends across emotional traces provided by multiple evaluators. The proposed method is more reliable than just averaging traces across evaluators, providing the flexibility to define hotspots at various reliability levels without having to recollect new perceptual evaluations.

show abstract

Prediction of Emotion Change From Speech

Cited by 8 publications

References 71 publications

The Ordinal Nature of Emotions: An Emerging Approach

The Ordinal Nature of Emotions: An Emerging Approach

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

Defining Emotionally Salient Regions Using Qualitative Agreement Method

Contact Info

Product

Resources

About