This paper is the basis paper for the accepted IJCNN challenge One-Minute Gradual-Emotion Recognition (OMG-Emotion) 1 by which we hope to foster long-emotion classification using neural models for the benefit of the IJCNN community. The proposed corpus has as novelty the data collection and annotation strategy based on emotion expressions which evolve over time into a specific context. Different from other corpora, we propose a novel multimodal corpus for emotion expression recognition, which uses gradual annotations with a focus on contextual emotion expressions. Our dataset was collected from Youtube videos using a specific search strategy based on restricted keywords and filtering which guaranteed that the data follow a gradual emotion expression transition, i.e. emotion expressions evolve over time in a natural and continuous fashion. We also provide an experimental protocol and a series of unimodal baseline experiments which can be used to evaluate deep and recurrent neural models in a fair and standard manner.
A robot capable of understanding emotion expressions can increase its own capability of solving problems by using emotion expressions as part of its own decision-making, in a similar way to humans. Evidence shows that the perception of human interaction starts with an innate perception mechanism, where the interaction between different entities is perceived and categorized into two very clear directions: positive or negative. While the person is developing during childhood, the perception evolves and is shaped based on the observation of human interaction, creating the capability to learn different categories of expressions. In the context of human–robot interaction, we propose a model that simulates the innate perception of audio–visual emotion expressions with deep neural networks, that learns new expressions by categorizing them into emotional clusters with a self-organizing layer. The proposed model is evaluated with three different corpora: The Surrey Audio–Visual Expressed Emotion (SAVEE) database, the visual Bi-modal Face and Body benchmark (FABO) database, and the multimodal corpus of the Emotion Recognition in the Wild (EmotiW) challenge. We use these corpora to evaluate the performance of the model to recognize emotional expressions, and compare it to state-of-the-art research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.