Facial Affect “In-the-Wild”: A Survey and a New Database

Zafeiriou, Stefanos; Papaioannou, Athanasios; Kotsia, Irene; Nicolaou, Mihalis A.; Zhao, Guoying

doi:10.1109/cvprw.2016.186

Cited by 40 publications

(13 citation statements)

References 98 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early datasets for FER such as JAFFE [44], CK [42,58], MMI [50], and MultiPie [2] were captured in a lab-controlled environment. Thus, the datasets collected in the wild condition, which contains the people's naturalistic emotion states [4,8,19,20,33,45,47,48,64] have attracted much more attention. Specifically, the AFEW [19] includes video clips extracted from movies and SFEW [20] is built using static images from the subset video clips of AFEW.…”

Section: Related Work 21 Emotion Recognition Datasetsmentioning

confidence: 99%

Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Gao

Yin

Zhang

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Recognizing the emotional state of people is a basic but challenging task in video understanding. In this paper, we propose a new task in this field, named Pairwise Emotional Relationship Recognition (PERR). This task aims to recognize the emotional relationship between the two interactive characters in a given video clip. It is different from the traditional emotion and social relation recognition task. Varieties of information, consisting of character appearance, behaviors, facial emotions, dialogues, background music as well as subtitles contribute differently to the final results, which makes the task more challenging but meaningful in developing more advanced multi-modal models. To facilitate the task, we develop a new dataset called Emotional RelAtionship of inTeractiOn (ERATO) based on dramas and movies. ERATO is a large-scale multi-modal dataset for PERR task, which has 31,182 video clips, lasting about 203 video hours. Different from the existing datasets, ERATO contains interaction-centric videos with multi-shots, varied video length, and multiple modalities including visual, audio and text. As a minor contribution, we propose a baseline model composed of Synchronous Modal-Temporal Attention (SMTA) unit to fuse the multi-modal information for the PERR task. In contrast to other prevailing attention mechanisms, our proposed SMTA can steadily improve the performance by about 1%. We expect the ER-ATO as well as our proposed SMTA to open up a new way for PERR task in video understanding and further improve the research of multi-modal fusion methodology. CCS CONCEPTS• Computing methodologies → Artificial intelligence; Natural language processing; Computer vision.

show abstract

Section: Related Work 21 Emotion Recognition Datasetsmentioning

confidence: 99%

Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Gao

Yin

Zhang

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Many works have been carried out for FBA using different facial behavior features and have applied it to various applications [17,18,19]. Bhatia et al used FBA in order to distinguish melancholia and non-melancholia subjects as it mainly depends on facial affect and mood [20].…”

Section: Related Workmentioning

confidence: 99%

Learners’ Efficiency Prediction Using Facial Behavior Analysis

Verma

Nakashima

Kobori

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

In the e-learning context, how much the learner is concentrated and engaged, or the learners' efficiency, is essential for providing adaptive and flexible materials, timely suggestions, etc., which can lead to efficient learning. In this work, we explore to predict learners' efficiency with a realistic configuration, in which we use a webcam or a laptop PC's built-in camera. Specifically, we first provide a feasible definition of the learners' efficiency, and based on this definition, we predict one's efficiency from facial behavior. We predict the learners' efficiency using various convolutional neural networks. Results are discussed using different evaluation metrics.

show abstract

“…The contributions of the already developed datasets and benchmarks for analysis of facial expression in the wild have been demonstrated during the challenges in Representation Learning (ICML 2013) [67], in the series of Emotion Recognition in the wild challenges (EmotiW 2013, 2014, 2015 [61,[68][69][70], and 2016 (https://sites.google.com/ site/emotiw2016/)) and in the recently organized workshop on context-based affect recognition (CBAR 2016 (http:// cbar2016.blogspot.gr/)). For a more extended overview on datasets collected in the wild, the reader is referred to [71].…”

Section: Ubiquitous Contextual Informationmentioning

confidence: 99%

User Adaptive and Context-Aware Smart Home Using Pervasive and Semantic Technologies

Vlachostergiou

Stratogiannis

Siolas

et al. 2016

Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

Ubiquitous Computing is moving the interaction away from the human-computer paradigm and towards the creation of smart environments that users and things, from the IoT perspective, interact with. User modeling and adaptation is consistently present having the human user as a constant but pervasive interaction introduces the need for context incorporation towards context-aware smart environments. The current article discusses both aspects of the user modeling and adaptation as well as context awareness and incorporation into the smart home domain. Users are modeled as fuzzy personas and these models are semantically related. Context information is collected via sensors and corresponds to various aspects of the pervasive interaction such as temperature and humidity, but also smart city sensors and services. This context information enhances the smart home environment via the incorporation of user defined home rules. Semantic Web technologies support the knowledge representation of this ecosystem while the overall architecture has been experimentally verified using input from the SmartSantander smart city and applying it to the SandS smart home within FIRE and FIWARE frameworks.

show abstract

Facial Affect “In-the-Wild”: A Survey and a New Database

Cited by 40 publications

References 98 publications

Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark

Learners’ Efficiency Prediction Using Facial Behavior Analysis

User Adaptive and Context-Aware Smart Home Using Pervasive and Semantic Technologies

Contact Info

Product

Resources

About