Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1110
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Speech Emotion and Sentiment Recognition for Interactive Dialogue Systems

Abstract: In this paper, we describe our approach of enabling an interactive dialogue system to recognize user emotion and sentiment in realtime. These modules allow otherwise conventional dialogue systems to have "empathy" and answer to the user while being aware of their emotion and intent. Emotion recognition from speech previously consists of feature engineering and machine learning where the first stage causes delay in decoding time. We describe a CNN model to extract emotion from raw speech input without feature e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 108 publications
(32 citation statements)
references
References 12 publications
0
32
0
Order By: Relevance
“…So far, to solve the problem of empathetic dialogue response generation, which is to understand the user emotion and respond appropriately (Bertero et al, 2016), there have been mainly two lines arXiv:1908.07687v1 [cs.CL] 21 Aug 2019 of work. The first is a multi-task approach that jointly trains a model to predict the current emotional state of the user and generate an appropriate response based on the state (Lubis et al, 2018;Rashkin et al, 2018).…”
Section: Speakermentioning
confidence: 99%
“…So far, to solve the problem of empathetic dialogue response generation, which is to understand the user emotion and respond appropriately (Bertero et al, 2016), there have been mainly two lines arXiv:1908.07687v1 [cs.CL] 21 Aug 2019 of work. The first is a multi-task approach that jointly trains a model to predict the current emotional state of the user and generate an appropriate response based on the state (Lubis et al, 2018;Rashkin et al, 2018).…”
Section: Speakermentioning
confidence: 99%
“…For this reason, we want to create a fully end-to-end model that we hope that will be able to automatically extract meaningful features and compare it with a standard features extraction method. [14], [15] and [16] are only a few of the publications that show the effectiveness of convolution as features extraction layers from raw audio. A drawback of this approach is that by adding the features extraction task to the model pipeline additional complexity is added to the end to end system increasing the need in the number of datapoints for training a robust model.…”
Section: Input Featuresmentioning
confidence: 99%
“…Also, other models, such as a deep averaging network (Iyyer et al, 2015), attention-based network (Winata et al, 2018), and memory network (Dou, 2017), have been investigated to improve the classification performance. Practically, the application of emotion classification has been investigated on interactive dialogue systems (Bertero et al, 2016;Winata et al, 2017;Siddique et al, 2017;.…”
Section: Related Workmentioning
confidence: 99%