2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022
DOI: 10.1109/cvprw56347.2022.00272
|View full text |Cite
|
Sign up to set email alerts
|

Multi-task Learning for Human Affect Prediction with Auditory–Visual Synchronized Representation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…The Action Unit Detection Challenge of the 5th ABAW Competition [20] is based on the Aff-Wild2 [16-19, 21-24, 55] database. Some of the AU detection approaches in the previous ABAW Competitions [16,17,24] fuse multimodal features including video and audio to provide multidimensional information to predict AUs' occurrence [13,14,50,58]. Meanwhile, other studies found that AU detection performance can be benefited from multi-task learning [3,13,36,56], i.e., jointly conducting expression recognition or valence/arousal estimation provides helpful cues for AU detection.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The Action Unit Detection Challenge of the 5th ABAW Competition [20] is based on the Aff-Wild2 [16-19, 21-24, 55] database. Some of the AU detection approaches in the previous ABAW Competitions [16,17,24] fuse multimodal features including video and audio to provide multidimensional information to predict AUs' occurrence [13,14,50,58]. Meanwhile, other studies found that AU detection performance can be benefited from multi-task learning [3,13,36,56], i.e., jointly conducting expression recognition or valence/arousal estimation provides helpful cues for AU detection.…”
Section: Introductionmentioning
confidence: 99%
“…Some of the AU detection approaches in the previous ABAW Competitions [16,17,24] fuse multimodal features including video and audio to provide multidimensional information to predict AUs' occurrence [13,14,50,58]. Meanwhile, other studies found that AU detection performance can be benefited from multi-task learning [3,13,36,56], i.e., jointly conducting expression recognition or valence/arousal estimation provides helpful cues for AU detection. Moreover, temporal models such as GRU [6] or Transformer [48] are also introduced to model temporal dynamics among consecutive frames [37,50].…”
Section: Introductionmentioning
confidence: 99%