Real-Time Speech Emotion and Sentiment Recognition for Interactive
            Dialogue Systems

Bertero, Dario; Siddique, Farhad Bin; Wu, Chien-Sheng; Wan, Yan; Chan, Ricky Ho Yin; Fung, Pascale

doi:10.18653/v1/d16-1110

Cited by 108 publications

(32 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…So far, to solve the problem of empathetic dialogue response generation, which is to understand the user emotion and respond appropriately (Bertero et al, 2016), there have been mainly two lines arXiv:1908.07687v1 [cs.CL] 21 Aug 2019 of work. The first is a multi-task approach that jointly trains a model to predict the current emotional state of the user and generate an appropriate response based on the state (Lubis et al, 2018;Rashkin et al, 2018).…”

Section: Speakermentioning

confidence: 99%

MoEL: Mixture of Empathetic Listeners

Lin

Madotto

Shin

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

150

View full text Add to dashboard Cite

Previous research on empathetic dialogue systems has mostly focused on generating responses given certain emotions. However, being empathetic not only requires the ability of generating emotional responses, but more importantly, requires the understanding of user emotions and replying appropriately. In this paper, we propose a novel end-toend approach for modeling empathy in dialogue systems: Mixture of Empathetic Listeners (MoEL). Our model first captures the user emotions and outputs an emotion distribution. Based on this, MoEL will softly combine the output states of the appropriate Listener(s), which are each optimized to react to certain emotions, and generate an empathetic response. Human evaluations on empatheticdialogues (Rashkin et al., 2018) dataset confirm that MoEL outperforms multitask training baseline in terms of empathy, relevance, and fluency. Furthermore, the case study on generated responses of different Listeners shows high interpretability of our model.

show abstract

Section: Speakermentioning

confidence: 99%

MoEL: Mixture of Empathetic Listeners

Lin

Madotto

Shin

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

150

View full text Add to dashboard Cite

show abstract

“…For this reason, we want to create a fully end-to-end model that we hope that will be able to automatically extract meaningful features and compare it with a standard features extraction method. [14], [15] and [16] are only a few of the publications that show the effectiveness of convolution as features extraction layers from raw audio. A drawback of this approach is that by adding the features extraction task to the model pipeline additional complexity is added to the end to end system increasing the need in the number of datapoints for training a robust model.…”

Section: Input Featuresmentioning

confidence: 99%

Self-Attention for Speech Emotion Recognition

Tarantino¹,

2019

View full text Add to dashboard Cite

Speech Emotion Recognition (SER) has been shown to benefit from many of the recent advances in deep learning, including recurrent based and attention based neural network architectures as well. Nevertheless, performance still falls short of that of humans. In this work, we investigate whether SER could benefit from the self-attention and global windowing of the transformer model. We show on the IEMOCAP database that this is indeed the case. Finally, we investigate whether using the distribution of, possibly conflicting, annotations in the training data, as soft targets could outperform a majority voting. We prove that this performance increases with the agreement level of the annotators.

show abstract

“…Also, other models, such as a deep averaging network (Iyyer et al, 2015), attention-based network (Winata et al, 2018), and memory network (Dou, 2017), have been investigated to improve the classification performance. Practically, the application of emotion classification has been investigated on interactive dialogue systems (Bertero et al, 2016;Winata et al, 2017;Siddique et al, 2017;.…”

Section: Related Workmentioning

confidence: 99%

CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification

Winata

Madotto

Lin

et al. 2019

Proceedings of the 13th International Workshop on Semantic Evaluation

Self Cite

View full text Add to dashboard Cite

Detecting emotion from dialogue is a challenge that has not yet been extensively surveyed. One could consider the emotion of each dialogue turn to be independent, but in this paper, we introduce a hierarchical approach to classify emotion, hypothesizing that the current emotional state depends on previous latent emotions. We benchmark several feature-based classifiers using pre-trained word and emotion embeddings, state-of-the-art end-toend neural network models, and Gaussian processes for automatic hyper-parameter search. In our experiments, hierarchical architectures consistently give significant improvements, and our best model achieves a 76.77% F1-score on the test set.

show abstract

Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems

Cited by 108 publications

References 12 publications

MoEL: Mixture of Empathetic Listeners

MoEL: Mixture of Empathetic Listeners

Self-Attention for Speech Emotion Recognition

CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification

Contact Info

Product

Resources

About