2023
DOI: 10.1007/978-3-031-25075-0_2
|View full text |Cite
|
Sign up to set email alerts
|

Affective Behaviour Analysis Using Pretrained Model with Facial Prior

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…It also used MAE pre-trained weights to enhance its performance. Li et al [23] also use MAE pre-trained weights combined with AffectNet supervised pre-trained weights and ranked 2nd in ABAW4. Zhang et al [37] proposed a transformer-based fusion module to fuse multi-modality features from audio, image, and word information.…”
Section: Related Workmentioning
confidence: 99%
“…It also used MAE pre-trained weights to enhance its performance. Li et al [23] also use MAE pre-trained weights combined with AffectNet supervised pre-trained weights and ranked 2nd in ABAW4. Zhang et al [37] proposed a transformer-based fusion module to fuse multi-modality features from audio, image, and word information.…”
Section: Related Workmentioning
confidence: 99%
“…The top results have been achieved by the Masked Auto-Encoder (MAE) pretrained on unlabeled face images. For example, the EMMA ensemble of pre-trained MAE ViT (Vision Transformer) and CNN took the 2nd place [20]. The winner [37] adopted ensembles of various temporal encoders, multi-task frameworks, and unsupervised (MAE-based) and supervised (IResNet/DenseNet-based) visual feature representation learning.…”
Section: Related Workmentioning
confidence: 99%
“…As one can see, smoothing does not work for AU detection but can significantly improve the results for other tasks: up to 0.06 difference in F1-score for EXPR classification and up to 0.06 difference in mean CCC for VA prediction. Moreover, the smoothing works nicely even for blending the best ---0.981 SMMEmotionNet [23] 0.3648 0.2617 0.4737 1.1002 Two-Aspect Information Interaction [31] 0.515 0.207 0.385 1.107 SS-MFAR [4] 0.397 0.235 0.493 1.125 EfficientNet-B2 [27] 0.384 0.302 0.461 1.147 MAE+ViT [20] 0.4588 0.3028 0.5054 1.2671 Cross-attentive module [23] 0.499 0.333 0.456 1.288 MT-EmotiEffNet + OpenFace [29] 0.447 0.357 0.496 1.300 MAE+Transformer [37] 0 frame-level 0.4847 0.3578 0.5194 1.3619 + MT-EmotiDDAMFN smoothing, tAU = 0.5 0.5578 0.4168 0.5194 1.4939 models (Fig. 5).…”
Section: Multi-task Learning Challengementioning
confidence: 99%