Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Zhang, Ziyang; An, Liuwei; Zishun, Cui,; xu, Ao; Dong, Tengteng

doi:10.48550/arxiv.2303.09158

Cited by 2 publications

(4 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, we compare our results with those of ME-Graph [31], and our method outperforms theirs by an average F1-score of 3.1.These results demonstrate the effectiveness of our approach in detecting AUs. 51.0 CtyunAI [60] 48.9 HSE-NN-SberAI [39] 48.8 USTC-AC [51] 48.1 HFUT-MAC [59] 47.5 SCLAB-CNU [35] 45.6 USC-IHP [53] 42.9 Baseline [20] 36.5…”

Section: Results On Validation Setmentioning

confidence: 99%

Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

Wang¹,

Song²,

Luo³

et al. 2023

Preprint

View full text Add to dashboard Cite

This paper presents our Facial Action Units (AUs) recognition submission to the fifth Affective Behavior Analysis in-the-wild Competition (ABAW). Our approach consists of three main modules: (i) a pre-trained facial representation encoder which produce a strong facial representation from each input face image in the input sequence; (ii) an AUspecific feature generator that specifically learns a set of AU features from each facial representation; and (iii) a spatiotemporal graph learning module that constructs a spatiotemporal graph representation. This graph representation describes AUs contained in all frames and predicts the occurrence of each AU based on both the modeled spatial information within the corresponding face and the learned temporal dynamics among frames. The experimental results show that our approach outperformed the baseline and the spatio-temporal graph representation learning allows our model to generate the best results among all ablated systems. Our model ranks at the 4th place in the AU recognition track at the 5th ABAW Competition.

show abstract

Section: Results On Validation Setmentioning

confidence: 99%

Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

Wang¹,

Song²,

Luo³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Achieving a 53.07% F1-score, while not considered high, is still acceptable and reasonable in this type of application [44,[46][47][48][49][50]. The complexity and inherent ambiguity of emotion recognition, coupled with the dataset's representation, make it challenging to achieve notable performance.…”

Section: Discussionmentioning

confidence: 99%

“…F1-Score Yu et al [46] 0.3075 Xue et al [47] 0.3218 Savchenko [48] 0.3292 Zhang et al [49] 0.3337 Zhou et al [50] 0.3532 Proposed approach 0.5307 * Value in bold represents the best performance.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Emotion Recognition beyond Pixels: Leveraging Facial Point Landmark Meshes

Arabian,

Abdulbaki Alshirbaji,

Chase

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Digital health apps have become a staple in daily life, promoting awareness and providing motivation for a healthier lifestyle. With an already overwhelmed healthcare system, digital therapies offer relief to both patient and physician alike. One such planned digital therapy application is the incorporation of an emotion recognition model as a tool for therapeutic interventions for people with autism spectrum disorder (ASD). Diagnoses of ASD have increased relatively rapidly in recent years. To ensure effective recognition of expressions, a system is designed to analyze and classify different emotions from facial landmarks. Facial landmarks combined with a corresponding mesh have the potential of bypassing hurdles of model robustness commonly affecting emotion recognition from images. Landmarks are extracted from facial images using the Mediapipe framework, after which a custom mesh is constructed from the detected landmarks and used as input to a graph convolution network (GCN) model for emotion classification. The GCN makes use of the relations formed from the mesh along with the special distance features extracted. A weighted loss approach is also utilized to reduce the effects of an imbalanced dataset. The model was trained and evaluated with the Aff-Wild2 database. The results yielded a 58.76% mean accuracy on the selected validation set. The proposed approach shows the potential and limitations of using GCNs for emotion recognition in real-world scenarios.

show abstract

Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge

Cited by 2 publications

References 39 publications

Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

Spatio-Temporal AU Relational Graph Representation Learning For Facial Action Units Detection

Emotion Recognition beyond Pixels: Leveraging Facial Point Landmark Meshes

Contact Info

Product

Resources

About