2023
DOI: 10.1109/taffc.2021.3095425
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Multiple Temporal Scales of Full-Body Movements for Emotion Classification

Abstract: This work investigates classification of emotions from full-body movements by using a novel Convolutional Neural Network-based architecture. The model is composed of two shallow networks processing in parallel when the 8-bit RGB images obtained from time intervals of 3D-positional data are the inputs. One network performs a coarse-grained modelling in the time domain while the other one applies a fine-grained modelling. We show that combining different temporal scales into a single architecture improves the cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 14 publications
(20 citation statements)
references
References 49 publications
0
16
0
Order By: Relevance
“…a forward head and chest bend express sadness in [40]), ii) recognizing specific gestures which are emblems of the emotions (e.g. raising arms and hands-on-hips are the gestures of pride according to [41] and [42]), or iii) processing the expressive quality of the movement [6], [9], [43], [44]. Out of these three possibilities, the second and the third use the temporal information of the data, while the first one performs only spatial processing.…”
Section: B Human Emotion Recognition From Full-body Movementsmentioning
confidence: 99%
See 3 more Smart Citations
“…a forward head and chest bend express sadness in [40]), ii) recognizing specific gestures which are emblems of the emotions (e.g. raising arms and hands-on-hips are the gestures of pride according to [41] and [42]), or iii) processing the expressive quality of the movement [6], [9], [43], [44]. Out of these three possibilities, the second and the third use the temporal information of the data, while the first one performs only spatial processing.…”
Section: B Human Emotion Recognition From Full-body Movementsmentioning
confidence: 99%
“…For example, [49] used an RNN with 3-layers to perform emotion classification from MoCap data of daily activities: clapping, drinking, throwing, and waving, etc., associated with four emotions: happy, angry, sad, and neutral. Beyan et al [6] present the joint training of two CNNs such that one of them performs coarse-grained modeling while the other applies fine-grained modeling in the time. The inputs of this network are 8-bit RGB images obtained from 3D-skeleton data over time.…”
Section: B Human Emotion Recognition From Full-body Movementsmentioning
confidence: 99%
See 2 more Smart Citations
“…The majority of works mainly concentrated on unimodal learning of emotions [11], [12], [13], i.e., processing a single modality. Although there exist breakthrough achievements by unimodal emotion recognition, due to the aforementioned multimodal nature of emotion expression, such models remain incapable in some circumstances.…”
Section: Introductionmentioning
confidence: 99%