2021
DOI: 10.1155/2021/5585041
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Attention‐Based Multimodal Fusion Network for Video Emotion Recognition

Abstract: The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
12
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 35 publications
0
12
0
Order By: Relevance
“…To evaluate the performance of the proposed method, we conduct experiments on four publicly available video emotion recognition datasets, namely, the MHED dataset [35], the HEIV dataset [36], the VideoEmotion-8 dataset [37], and the Ekman-6 dataset [20].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To evaluate the performance of the proposed method, we conduct experiments on four publicly available video emotion recognition datasets, namely, the MHED dataset [35], the HEIV dataset [36], the VideoEmotion-8 dataset [37], and the Ekman-6 dataset [20].…”
Section: Methodsmentioning
confidence: 99%
“…To evaluate the performance of the proposed method, we conduct experiments on four publicly available video emotion recognition datasets, namely, the MHED dataset [ 35 ], the HEIV dataset [ 36 ], the VideoEmotion-8 dataset [ 37 ], and the Ekman-6 dataset [ 20 ]. MHED [ 35 ]: the MHED dataset is composed of 1066 videos, which uses a training set of 638 videos and a testing set of 428 videos.…”
Section: Experiments' Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…We conduct experiments on five publicly available video emotion recognition data sets, namely the MHED data set [ 28 ], the HEIV data set [ 29 ], the ekman-6 data set [ 30 ], the videoemotion-8 data set [ 31 ], and the SFEW data set [ 32 ]. The MHED data set is composed of 1,066 videos that are manually downloaded from the network, and it uses a training set of 638 videos and a testing set of 428 videos.…”
Section: Methodsmentioning
confidence: 99%
“…To this end, recent studies designed neural networks and optimised model parameters [4]. Although previous researches [5][6][7][8][9][10][11][12][13] have achieved promising progress, it is still challenging to analyse the emotions induced by videos.…”
mentioning
confidence: 99%