“…The Action Unit Detection Challenge of the 5th ABAW Competition [20] is based on the Aff-Wild2 [16-19, 21-24, 55] database. Some of the AU detection approaches in the previous ABAW Competitions [16,17,24] fuse multimodal features including video and audio to provide multidimensional information to predict AUs' occurrence [13,14,50,58]. Meanwhile, other studies found that AU detection performance can be benefited from multi-task learning [3,13,36,56], i.e., jointly conducting expression recognition or valence/arousal estimation provides helpful cues for AU detection.…”