2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII) 2019
DOI: 10.1109/acii.2019.8925497
|View full text |Cite
|
Sign up to set email alerts
|

Attending to Emotional Narratives

Abstract: Attention mechanisms in deep neural networks have achieved excellent performance on sequence-prediction tasks.Here, we show that these recently-proposed attention-based mechanisms-in particular, the Transformer with its parallelizable self-attention layers, and the Memory Fusion Network with attention across modalities and time-also generalize well to multimodal time-series emotion recognition. Using a recentlyintroduced dataset of emotional autobiographical narratives, we adapt and apply these two attention m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 38 publications
0
14
0
Order By: Relevance
“…In its current form, the SENDv1 has proven to be a useful data set for training emotion recognition models; however, there is still room for future work on the modeling, in order to best extract and integrate the rich information from multiple modalities. For example, all our best-performing models used only one or two modalities, and future research could examine how to better integrate multimodal information to improve performance: We found in a recent investigation [49] that state-of-the-art models with simple concatenation fusion do poorly on multimodal inputs on the SENDv1, and required more sophisticated fusion methods to better integrate multiple modalities.…”
Section: Modelingmentioning
confidence: 98%
See 1 more Smart Citation
“…In its current form, the SENDv1 has proven to be a useful data set for training emotion recognition models; however, there is still room for future work on the modeling, in order to best extract and integrate the rich information from multiple modalities. For example, all our best-performing models used only one or two modalities, and future research could examine how to better integrate multimodal information to improve performance: We found in a recent investigation [49] that state-of-the-art models with simple concatenation fusion do poorly on multimodal inputs on the SENDv1, and required more sophisticated fusion methods to better integrate multiple modalities.…”
Section: Modelingmentioning
confidence: 98%
“…This subsequently led to a surge of interest in applying LSTMs, especially to time-series emotion recognition on the AVEC 2015 [42], [43], AVEC 2017 [44], [45], AVEC 2018 [46], and OMG-Empathy 2019 [47] challenges. Other noteworthy examples are [48], who investigated bidirectional LSTMs (where there is another recurrence that goes backwards in time), [49] who combined neural attention mechanisms with LSTMs, and [50] who built an LSTM with electroencephalography (EEG) input. These papers have collectively found that RNNs/LSTMs are a powerful model for time-series emotion recognition, whether they rely on extracted low-level features, or combined with features extracted using CNNs.…”
Section: Discriminative Modelsmentioning
confidence: 99%
“…Word embeddings are widely applied in sentiment analysis with neural network models [2,3,19]. However, these models often lack clear interpretations of word vectors [7].…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we used Stanford Emotional Narratives Dataset (SEND) as our dataset. SEND is comprised of transcripts of video recordings in which participants shared emotional stories, and it has been well explored in computational models of emotion [12,19]. In each transcript, timestamps were generated for every word based on force-alignments 3 of audio inputs, and continuous emotional valence ratings were col-lected by annotators 4 .…”
Section: Datasetmentioning
confidence: 99%
“…A self-attention layer was used to learn the alignment weights between speech frames and text words from different time-stamps. In addition, Wu et.al., in [158], employed transformer-based self-attention to attend the emotional autobiographical narratives. In their study, attention mechanisms were found to be powerful in a combination of Memory Fusion Network for multimodal fusion of audio, video, and text modalities.…”
Section: Related Work 721 Attention Mechanisms For Multimodal Learningmentioning
confidence: 99%