“…Such systems, including affective multimodal interfaces, interactive multi-party games, and online services, would facilitate market research analysis, personalized e-commerce, and recruitment as well as enable patient-centric healthcare technologies such as emote monitoring of conditions like pain, anxiety and depression, to mention but a few examples. A fundamental pre-requisite for the development of interfaces like the above mentioned is the deployment of end-to-end machine learning frameworks capable of detecting, tracking, modeling, recognizing and predicting naturalistic -and, consequently, highly ambiguous -human behaviors [1]. Urged by this ever-growing necessity, this paper focuses on temporal dynamics-based behavior prediction in-the-wild, that is in naturalistic, unconstrained conditions.…”