2020
DOI: 10.1609/aaai.v34i03.5646
|View full text |Cite
|
Sign up to set email alerts
|

MIMAMO Net: Integrating Micro- and Macro-Motion for Video Emotion Recognition

Abstract: Spatial-temporal feature learning is of vital importance for video emotion recognition. Previous deep network structures often focused on macro-motion which extends over long time scales, e.g., on the order of seconds. We believe integrating structures capturing information about both micro- and macro-motion will benefit emotion prediction, because human perceive both micro- and macro-expressions. In this paper, we propose to combine micro- and macro-motion features to improve video emotion recognition with a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 45 publications
(27 citation statements)
references
References 23 publications
0
27
0
Order By: Relevance
“…Each of them is a programmable single unit which has multiple inputs. The inputs of each neuron are obtained at the time of launching a new request by using a simple cognitive test that can quantify the subject's reaction time, combined with the MIMAMO-net (Deng et al 2019) that can qualify the subject's emotional response by facial emotion recognition. In other words, FNN is an AI model based on a network of specialized neurons, i.e.…”
Section: Methodology Artificial Mirror Neuron (Amn) Of the Functionalmentioning
confidence: 99%
“…Each of them is a programmable single unit which has multiple inputs. The inputs of each neuron are obtained at the time of launching a new request by using a simple cognitive test that can quantify the subject's reaction time, combined with the MIMAMO-net (Deng et al 2019) that can qualify the subject's emotional response by facial emotion recognition. In other words, FNN is an AI model based on a network of specialized neurons, i.e.…”
Section: Methodology Artificial Mirror Neuron (Amn) Of the Functionalmentioning
confidence: 99%
“…Specifically, Pre-trained 3D-CNN for Human Action is adopted to extract spatiotemporal features of emotions in videos [52][53]. Furthermore, Pre-trained 3D-CNN for sports are also applied in a series of studies [7] [58], which consists of two stages: Two-Stream convolutional Neural Network and Gated Recurrent Unit Network to capture both micro-and macro-motion, respectively. The feature representation of a snippet, i.e.…”
Section: ) Preprocessmentioning
confidence: 99%
“…• W eighted average F 1: (1,1) (0,0) (0.5,0.5) (0.56,0. 45) [130] (0.529, 0.377) [58] (0.535, 0.365) [41] (Valence, Arousal) 2019 2020 2021 (d) OMG Fig. 6.…”
Section: ) Categoricalmentioning
confidence: 99%
“…There are many related works in this field depending on data sets used and models proposed [20], [21]. There are also previous studies on emotion recognition in videos [18], [25]. Valence and Arousal (V-A) are not separated values; binding these two parameters describe an emotion.…”
Section: Introductionmentioning
confidence: 99%
“…Vielzeuf et al [14] trained audiovisual ensemble network on emotion video classification. MI-MAMO Net Deng et al [18] trained a spatial, temporal network with CCC loss.…”
Section: Introductionmentioning
confidence: 99%