2020
DOI: 10.48550/arxiv.2009.14440
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Affect Expression Behaviour Analysis in the Wild using Spatio-Channel Attention and Complementary Context Information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 17 publications
1
7
0
Order By: Relevance
“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”
Section: Related Workmentioning
confidence: 99%
“…Dynamic representation-learning approaches possess an inherent advantage and become potential candidates for further consideration. To perform the task at hand, we shortlisted Meng et al (2019;Kuo et al (2018;Gera and Balasubramanian (2020;Savchenko (2021), and Kuhnke et al (2020) based on factors such as performance on open-source FER data sets like CK+ (Lucey et al, 2010) and AFEW (Kossaifi et al, 2017), depth of the neural network used (determines the minimum amount of data required for training), and reproducibility of results claimed by authors. Out of the five, Frame Attention Networks (FAN) (Meng et al, 2019) is chosen for its state-of-the-art accuracy on CK+ (99 %) and AFEW (51.18 %) data sets, and its simple yet effective construction.…”
Section: Related Workmentioning
confidence: 99%
“…Considering the problems of unbalanced data and missing label, Deng et al [7] propose a structure of Teacher-Student to learn from the unlabelled data by way of soft label. Besides the multi-task frameworks, Gera et al [8] focus on the task of discrete emotion classification and propose the network based on attention mechanism. Zhang et al [9] propose a multi-model approach M 3 T for valence-arousal estimation using the visual feature extracted from 3D convolution network and a bidirectional recurrent neural network and the audio features extracted from a acoustic sub-network.…”
Section: Automatic Affective Behavior Analysismentioning
confidence: 99%
“…Different from most existed facial emotion datasets [3,4,5,6] that contain only one of the three common used emotional representations: Categorical Emotions (CE), Action Units (AU), and Valence Arousal (VA), the Aff-Wild2 [2] dataset is annotated with all three kinds of emotional labels, containing extended facial behaviors in random conditions and increased subjects/frames to the former Aff-Wild [1] dataset. Consequently, the multi-task affective recognition can benefit from it, for example, the works [7,8,9,10] participated in the first Affective Behavior Analysis in-the-wild (ABAW) Competition [11].…”
Section: Introductionmentioning
confidence: 99%
“…A. W. Yip et al [14] compared the accuracy of face recognition between color images and gray-scale images, and found that there was almost no difference in accuracy at a certain high resolution.It also shows that if a pseudo-color image with adjusted color tones is refined from a gray-scale image, the accuracy will be equal to or higher than that of a color im- age even at low resolution. In emotion estimation, it has been shown that the estimation accuracy is improved by extracting facial features using RESNET pre-trained with the VggFace2 dataset [15] [16]. It is also suggested that the accuracy of emotion estimation can be improved by learning with multi-modal information including audio as well as video [16] [17].…”
Section: Related Workmentioning
confidence: 99%