2023
DOI: 10.3390/biomimetics8020199
|View full text |Cite
|
Sign up to set email alerts
|

Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition

Abstract: This paper presents a novel facial expression recognition network, called Distract your Attention Network (DAN). Our method is based on two key observations in biological visual perception. Firstly, multiple facial expression classes share inherently similar underlying facial appearance, and their differences could be subtle. Secondly, facial expressions simultaneously exhibit themselves through multiple facial regions, and for recognition, a holistic approach by encoding high-order interactions among local fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 124 publications
(19 citation statements)
references
References 67 publications
0
2
0
Order By: Relevance
“…EAC [28] uses randomly removing some facial regions to enhance the model's learning of the non-noise part and thus enhances the model's performance. DAN [47] adopts a multi-task joint approach to capture both local and global feature information. Compared to these methods, our proposed method captures local and global features based on the use of adaptive learning to retain tokens that are most relevant to expression categorization, thus achieving the effect of removing noisy and occluded regions and enhancing computational efficiency.…”
Section: Comparision Studymentioning
confidence: 99%
“…EAC [28] uses randomly removing some facial regions to enhance the model's learning of the non-noise part and thus enhances the model's performance. DAN [47] adopts a multi-task joint approach to capture both local and global feature information. Compared to these methods, our proposed method captures local and global features based on the use of adaptive learning to retain tokens that are most relevant to expression categorization, thus achieving the effect of removing noisy and occluded regions and enhancing computational efficiency.…”
Section: Comparision Studymentioning
confidence: 99%
“…Two key observations of biological visual perception are the basis of DAN. It is observed that multiple facial expression categories have essentially similar beginnings in appearance with very small disparities, while facial expressions express themselves through multiple facial feature regions, but in order to discriminate each facial expression, a coding method for high order interaction between local features is needed, and with these issues as the main focus, three components of the DAN are proposed: the feature clustering network (FCN) , Multiple Attention Network (MAN) and Attention Fusion Network (AFN) [6].…”
Section: Distracted Attention Network (Dan)mentioning
confidence: 99%
“…This branch is then combined with the local-global information extracted by MobileViT, From the visualization results in Figure 11, both optical flow and MobileViT focus more on the regions around the eyes and the corners of the mouth, while Swin Transformer focuses on the spatial information between the eyebrows, the cheeks, and the chin areas. If the optical flow map is fed into MobileViT (experiment (b)), the optical flow method and MobileViT focus on similar regions, which will cause the network to duplicate the and generate unnecessary redundant information [41]. In contrast, in experiment (a), the optical flow map is sent to Swin Transformer, enabling this branch to focus on complementary ME regions and extract detailed features more comprehensively.…”
Section: The Validity Analysis On Feature Extractionmentioning
confidence: 99%