2021
DOI: 10.48550/arxiv.2109.07270
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition

Abstract: We present a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations. Firstly, multiple classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions exhibit themselves through multiple facial regions simultaneously, and the recognition requires a holistic approach by encoding high-order interactions among local features. To address these issues, we propose our DAN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
31
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(44 citation statements)
references
References 51 publications
0
31
0
Order By: Relevance
“…As a result, for both gender and race groups, CL models achieve high fairness scores by balancing performance across the domain splits while also offering competitive accuracy scores, with the SI model performing the best in terms of fairness (see Table 11). Yet, NR achieves a competitive average accuracy score (across domain groups) of 0.771 for gender and 0.807 for race evaluations, highest amongst the compared methods, with the benchmark evaluations on RAF-DB at 0.853 [58]. Selective updates of model parameters to mitigate forgetting allows CL models to maintain high accuracy scores across the different gender and race attributes.…”
Section: Facial Expression Recognitionmentioning
confidence: 98%
“…As a result, for both gender and race groups, CL models achieve high fairness scores by balancing performance across the domain splits while also offering competitive accuracy scores, with the SI model performing the best in terms of fairness (see Table 11). Yet, NR achieves a competitive average accuracy score (across domain groups) of 0.771 for gender and 0.807 for race evaluations, highest amongst the compared methods, with the benchmark evaluations on RAF-DB at 0.853 [58]. Selective updates of model parameters to mitigate forgetting allows CL models to maintain high accuracy scores across the different gender and race attributes.…”
Section: Facial Expression Recognitionmentioning
confidence: 98%
“…Afterwards, a multihead attention network consisting of a combination of a spatial attention unit and a channel attention unit takes the features and outputs an attention map. Finally, attention fusion network merges attention maps to be learned in an orchestrated fashion [10]. Second, after applying the process mentioned above, the final feature information is fed to fully connected layer and batch normalization layer.…”
Section: Model Architecturementioning
confidence: 99%
“…As evaluation metrics, mean Concordance Correlation Coefficient (CCC) of valence and arousal and the average F1 Score across all 8 categories were used for the VA estimation challenge and the expression classification challenge, respectively. In this paper, we propose an extended version of DAN model based on ResNet with attention mechanisms proposed by [10] to solve the challenges mentioned Figure 1: Overview of the architecture used in this study above , and present the preliminary results on the official validation set. More precise and detailed results can be updated and added through the subsequent submissions to the competition.…”
Section: Introductionmentioning
confidence: 99%
“…Humans have a remarkable ability to understand the emotion or what people want to convey since we can attention on major expression features, such as, winking, mouth opening, sniffing, tilting. Accordingly, many studies focus on capturing and strengthen the local feature using attention mechanism [27,28,49] and retrieve promising results. Attention mechanism can be treated as a dynamic process which select adaptive weighting features according to the different input [11].…”
Section: Input Image Expected Attentionmentioning
confidence: 99%