“…In this paper, we address the problem of facial behaviour recognition from video, in particular, the problem of recognising apparent emotions in terms of Valence and Arousal [51], and facial expressions in terms of Action Unit intensity [11]. This is a longstanding problem in video recognition which has been extensively studied by the computer vision community [26,64,41,21,33,28,67,74,77,71,55,53,54,39,46]. Nevertheless, even recent methods [27,33,77] struggle to achieve high accuracy on the most difficult datasets including SEWA [34], Aff-Wild2 [29], BP4D [75] and DISFA [42].…”