A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

Zhi-Xuan, Tan; Goel, Arushi; Nguyen, Thanh–Son; Ong, Desmond C.

doi:10.1109/fg.2019.8756577

Cited by 8 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Although attention in deep networks is not well understoode.g., it is still unclear under what theoretical conditions attention is useful-and is likely very different from how human attention is actually implemented in the brain, these attention mechanisms have proven to be surprisingly effective in improving deep neural network performance. Aside from a few very recent papers [17], [25], [26], there has not been much "attention" paid to these attention mechanisms within affective computing. We hope that our results will help to demonstrate the efficacy of such approaches and to encourage more research in this area.…”

Section: Discussionmentioning

confidence: 99%

“…As mentioned, many researchers have used LSTMs [11] to predict emotions over time [14]- [17]. In our SFT (Fig.…”

Section: Long Short-term Memory Networkmentioning

confidence: 99%

“…Correspondence to DCO at dco@comp.nus.edu.sg units with a directed recurrent connection to subsequent units, which enable them to model sequences over time. Many researchers have successfully applied RNNs [12], [13] and LSTMs [14]- [17] to recognize emotions from video.…”

Section: Introductionmentioning

confidence: 99%

“…Since its introduction, Transformer-based models have been used in various NLP tasks, including text comprehension [21], Question-and-Answering [22] machine translation [23], and language modelling [24]. Notably, however, self-attention mechanisms like in the Transformer have not yet been applied to emotion recognition, and the very recent success of other types of attention applied to emotion recognition (e.g., in RNNs [25] and LSTMs [17], or the Memory Fusion Network of [26]) suggest that these may be fruitful approaches that should be further investigated.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Attending to Emotional Narratives

Zhi-Xuan

Zaki

et al. 2019

2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)

Self Cite

View full text Add to dashboard Cite

Attention mechanisms in deep neural networks have achieved excellent performance on sequence-prediction tasks.Here, we show that these recently-proposed attention-based mechanisms-in particular, the Transformer with its parallelizable self-attention layers, and the Memory Fusion Network with attention across modalities and time-also generalize well to multimodal time-series emotion recognition. Using a recentlyintroduced dataset of emotional autobiographical narratives, we adapt and apply these two attention mechanisms to predict emotional valence over time. Our models perform extremely well, in some cases reaching a performance comparable with human raters. We end with a discussion of the implications of attention mechanisms to affective computing.

show abstract

Section: Discussionmentioning

confidence: 99%

“…As mentioned, many researchers have used LSTMs [11] to predict emotions over time [14]- [17]. In our SFT (Fig.…”

Section: Long Short-term Memory Networkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Attending to Emotional Narratives

Zhi-Xuan

Zaki

et al. 2019

2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)

Self Cite

View full text Add to dashboard Cite

show abstract

“…[39], [40] and [41] were some of the earlier papers that worked on comparing multimodal LSTMs with Support Vector Regressions and other approaches for valence and arousal classification recognition on the SEMAINE dataset. This subsequently led to a surge of interest in applying LSTMs, especially to time-series emotion recognition on the AVEC 2015 [42], [43], AVEC 2017 [44], [45], AVEC 2018 [46], and OMG-Empathy 2019 [47] challenges. Other noteworthy examples are [48], who investigated bidirectional LSTMs (where there is another recurrence that goes backwards in time), [49] who combined neural attention mechanisms with LSTMs, and [50] who built an LSTM with electroencephalography (EEG) input.…”

Section: Discriminative Modelsmentioning

confidence: 99%

Modeling Emotion in Complex Stories: The Stanford Emotional Narratives Dataset

Ong

Zhi-Xuan

et al. 2021

IEEE Trans. Affective Comput.

Self Cite

View full text Add to dashboard Cite

Human emotions unfold over time, and more affective computing research has to prioritize capturing this crucial component of real-world affect. Modeling dynamic emotional stimuli requires solving the twin challenges of time-series modeling and of collecting high-quality time-series datasets. We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models. We then introduce the first version of the Stanford Emotional Narratives Dataset (SENDv1): a set of rich, multimodal videos of self-paced, unscripted emotional narratives, annotated for emotional valence over time. The complex narratives and naturalistic expressions in this dataset provide a challenging test for contemporary time-series emotion recognition models. We demonstrate several baseline and state-of-the-art modeling approaches on the SEND, including a Long Short-Term Memory model and a multimodal Variational Recurrent Neural Network, which perform comparably to the human-benchmark. We end by discussing the implications for future research in time-series affective computing.recognition. Specifically, we define time-series modeling as taking in temporally continuous input data and producing temporally continuous output, with an explicit consideration of how information is propagated over time. For instance, in order to engage in such inference, a social robot in conversation with its user would have to take in a continuous stream of sensor data, process them, and reason about their user's emotions over time, perhaps after every second or after every sentence, as well as across many sentences in the conversation and across multiple conversations [11].Despite the progress that has been made in time-series emotion recognition in the past decade, the field is still far from affective robots that can understand human emotions in daily life. What is needed to achieve this ambitious goal? We suggest that the biggest barriers to overcome are due to (1) the inherent difficulty of building computational time-series models, and (2) the difficulty of collecting highquality datasets. To address this first gap, we conduct a review covering different machine-learning-based approaches to time-series modeling (Section 2). We begin by discussing the most common time-series techniques in affective computing: deep neural network models, part of a broader class of discriminative models. We also cover generative time-series approaches, which are comparatively less popular within affective computing, but offer interesting modeling capabilities and hold exciting potential for emotion understanding.We turn next to discuss the second gap: Researchers need high-quality time-series datasets on which to train models. These are expensive to construct, in terms of both the production of stimuli and the collection of timeseries annotations of emotion and affective labeling [12]. There are several existing time-series datasets that have been used by the ...

show abstract

SeLF: A Deep Neural Network Based Multimodal Sequential Late Fusion Approach for Human Emotion Recognition

Modi

Sharma

2019

Communications in Computer and Information Science

View full text Add to dashboard Cite

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

Cited by 8 publications

References 17 publications

Attending to Emotional Narratives

Attending to Emotional Narratives

Modeling Emotion in Complex Stories: The Stanford Emotional Narratives Dataset

SeLF: A Deep Neural Network Based Multimodal Sequential Late Fusion Approach for Human Emotion Recognition

Contact Info

Product

Resources

About