2022
DOI: 10.48550/arxiv.2204.04083
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition

Abstract: Facial Expression Recognition (FER) has received increasing interest in the computer vision community. As a challenging task, there are three key issues especially prevalent in FER: inter-class similarity, intra-class discrepancy, and scale sensitivity. Existing methods typically address some of these issues, but do not tackle them all in a unified framework. Therefore, in this paper, we propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER) that aims to holistically solve these issues. Specifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 32 publications
1
10
0
Order By: Relevance
“…We evaluat the FER performance of POSTER V2 on the widely used RAF-DB, AffectNet and CAER-S Settings. Similar to POSTER V1 [58], we also use the ir50 [7] network pre-trained on the Ms-Celeb-1M [14] dataset as the image backbone. And MobileFaceNet [2] with frozen weights is used as our facial landmark detector.…”
Section: Experiments Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluat the FER performance of POSTER V2 on the widely used RAF-DB, AffectNet and CAER-S Settings. Similar to POSTER V1 [58], we also use the ir50 [7] network pre-trained on the Ms-Celeb-1M [14] dataset as the image backbone. And MobileFaceNet [2] with frozen weights is used as our facial landmark detector.…”
Section: Experiments Setupmentioning
confidence: 99%
“…Among many excellent FER works, POSTER V1 [58] stands out with state-of-the-art (SOTA) performance. POSTER V1 mainly solves three key issues of FER at the same time: inter-class similarity, intra-class discrepancy and scale sensitivity.…”
Section: Introductionmentioning
confidence: 99%
“…POSTER:The two-stream Pyramid crOss-fuSion Trans-formER network (POSTER) (Zheng, Mendieta, and Chen 2022) is proposed to address the challenges of facial expression recognition. It effectively integrates facial landmark and direct image features using a transformer-based crossfusion paradigm and employs a pyramid structure to ensure scale invariance.…”
Section: Visual Featuresmentioning
confidence: 99%
“…In 2020, Dosovitskiy et al [ 25 ] proposed a new ViT that trained the model on image classification tasks (e.g., FER) and demonstrated promising results. Since then, numerous Transformer-based models designed for vision-related tasks have garnered significant attention from researchers [ 43 , 44 , 45 , 46 ]. These Transformer-based models split an image into tokens.…”
Section: Related Workmentioning
confidence: 99%