2023
DOI: 10.1007/978-3-031-25075-0_10
|View full text |Cite
|
Sign up to set email alerts
|

Affective Behavior Analysis Using Action Unit Relation Graph and Multi-task Cross Attention

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…As one can see, smoothing does not work for AU detection but can significantly improve the results for other tasks: up to 0.06 difference in F1-score for EXPR classification and up to 0.06 difference in mean CCC for VA prediction. Moreover, the smoothing works nicely even for blending the best ---0.981 SMMEmotionNet [23] 0.3648 0.2617 0.4737 1.1002 Two-Aspect Information Interaction [31] 0.515 0.207 0.385 1.107 SS-MFAR [4] 0.397 0.235 0.493 1.125 EfficientNet-B2 [27] 0.384 0.302 0.461 1.147 MAE+ViT [20] 0.4588 0.3028 0.5054 1.2671 Cross-attentive module [23] 0.499 0.333 0.456 1.288 MT-EmotiEffNet + OpenFace [29] 0.447 0.357 0.496 1.300 MAE+Transformer [37] 0 frame-level 0.4847 0.3578 0.5194 1.3619 + MT-EmotiDDAMFN smoothing, tAU = 0.5 0.5578 0.4168 0.5194 1.4939 models (Fig. 5).…”
Section: Multi-task Learning Challengementioning
confidence: 97%
See 1 more Smart Citation
“…As one can see, smoothing does not work for AU detection but can significantly improve the results for other tasks: up to 0.06 difference in F1-score for EXPR classification and up to 0.06 difference in mean CCC for VA prediction. Moreover, the smoothing works nicely even for blending the best ---0.981 SMMEmotionNet [23] 0.3648 0.2617 0.4737 1.1002 Two-Aspect Information Interaction [31] 0.515 0.207 0.385 1.107 SS-MFAR [4] 0.397 0.235 0.493 1.125 EfficientNet-B2 [27] 0.384 0.302 0.461 1.147 MAE+ViT [20] 0.4588 0.3028 0.5054 1.2671 Cross-attentive module [23] 0.499 0.333 0.456 1.288 MT-EmotiEffNet + OpenFace [29] 0.447 0.357 0.496 1.300 MAE+Transformer [37] 0 frame-level 0.4847 0.3578 0.5194 1.3619 + MT-EmotiDDAMFN smoothing, tAU = 0.5 0.5578 0.4168 0.5194 1.4939 models (Fig. 5).…”
Section: Multi-task Learning Challengementioning
confidence: 97%
“…At the same time, the hybrid CNN (Convolutional Neural Network)-Transformer [33] with a fusion of ResNet-18 and a spatial transformer took the 5th place. Slightly better results have been obtained by the cross-attentive module and a facial graph that captures the association among action units [23]. The EfficientNet model pre-trained in Multi-Task setting (MT-EmotiEffNet) took the third place [29].…”
Section: Related Workmentioning
confidence: 99%
“…Song et al [47] construct a co-occurrence knowledge graph and a spatio-temporal Transformer module to capture the temporal and spatial relations of AUs. Nguyen et al [36] use a facial graph to capture the association among action units for the multi-task learning challenge and they ranked 4th in multi-task challenge at the ABAW competition 2022. These works exhibit the effectiveness of modeling AU relationships.…”
Section: Graph-based Au Recognition Approachesmentioning
confidence: 99%
“…Some of the AU detection approaches in the previous ABAW Competitions [16,17,24] fuse multimodal features including video and audio to provide multidimensional information to predict AUs' occurrence [13,14,50,58]. Meanwhile, other studies found that AU detection performance can be benefited from multi-task learning [3,13,36,56], i.e., jointly conducting expression recognition or valence/arousal estimation provides helpful cues for AU detection. Moreover, temporal models such as GRU [6] or Transformer [48] are also introduced to model temporal dynamics among consecutive frames [37,50].…”
Section: Introductionmentioning
confidence: 99%
“…The in-the-wild affective behavior analysis is a study to understand individuals' emotions and moods through their facial expressions, action behavior and physical characteristics. It has been an important research direction in the fields of mental health treatment, human-computer interaction, and marketing research [1,5,20,24]. Usually, three main representations, i.e., Action Units (AU), Valence-Arousal * Corresponding author (VA), and the basic facial expressions(e.g., happy, sad, and neutral) are used to gain insight into how individuals express and experience emotions.…”
Section: Introductionmentioning
confidence: 99%