2023
DOI: 10.48550/arxiv.2301.12149
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

POSTER++: A simpler and stronger facial expression recognition network

Abstract: Facial expression recognition (FER) plays an important role in a variety of real-world applications such as human-computer interaction. POSTER V1 achieves the state-of-the-art (SOTA) performance in FER by effectively combining facial landmark and image features through two-stream pyramid cross-fusion design. However, the architecture of POSTER V1 is undoubtedly complex. It causes expensive computational costs. In order to relieve the computational pressure of POSTER V1, in this paper, we propose POSTER V2. It … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 31 publications
0
12
0
Order By: Relevance
“…Hence, researchers also have begun to introduce the Transformer or the ViT architecture for FER in recent years [23]- [29], motivated by their performance achieved across different tasks. Based on the results posted in existing work, the application of vision transformers in FER is proven to be useful with ViT+SE [23] posting the state-of-the-art performance of 99.80% mean accuracy across 10 folds on the CK+ database, POSTER++ [27] being the best-performing method on the RAF-DB database with 92.21% accuracy, and POSTER [25] being the second best-performing method on the FERPlus database by achieving 91.62% accuracy. However, this performance often comes with large architectures with significantly more parameters than CNN-based methods.…”
Section: B Vision Transformersmentioning
confidence: 97%
“…Hence, researchers also have begun to introduce the Transformer or the ViT architecture for FER in recent years [23]- [29], motivated by their performance achieved across different tasks. Based on the results posted in existing work, the application of vision transformers in FER is proven to be useful with ViT+SE [23] posting the state-of-the-art performance of 99.80% mean accuracy across 10 folds on the CK+ database, POSTER++ [27] being the best-performing method on the RAF-DB database with 92.21% accuracy, and POSTER [25] being the second best-performing method on the FERPlus database by achieving 91.62% accuracy. However, this performance often comes with large architectures with significantly more parameters than CNN-based methods.…”
Section: B Vision Transformersmentioning
confidence: 97%
“…The POSTER based network architecture is optimized and improved for the FER task, but the network structure of POSTER is not only complex but also leads to expensive computational costs, so in then based on POSTER, POSTER++ is proposed in order to alleviate the computational pressure. POSTER++ improves on POSTER in three ways: Cross Fusion, Dual Streaming and Multi-scale Feature Extraction [4].…”
Section: Postermentioning
confidence: 99%
“…POSTER2:The proposed POSTER2 (Mao et al 2023) aims to improve upon the complex architecture of POSTER, which achieves state-of-the-art performance in facial expression recognition (FER) by combining facial landmark and image features through a two-stream pyramid crossfusion design. POSTER2 reduces computational cost by using a window-based cross-attention mechanism, removing the image-to-landmark branch in the two-stream design, and combining images with landmark's multi-scale features.…”
Section: Visual Featuresmentioning
confidence: 99%