The growing ubiquity of Social Media data offers an attractive perspective for improving the quality of machine learning-based models in several fields, ranging from Computer Vision to Natural Language Processing. In this paper we focus on Facebook posts paired with "reactions" of multiple users, and we investigate their relationships with classes of emotions that are typically considered in the task of emotion detection. We are inspired by the idea of introducing a connection between reactions and emotions by means of First-Order Logic formulas, and we propose an end-to-end neural model that is able to jointly learn to detect emotions and predict Facebook reactions in a multi-task environment, where the logic formulas are converted into polynomial constraints. Our model is trained using a large collection of unsupervised texts together with data labeled with emotion classes and Facebook posts that include reactions. An extended experimental analysis that leverages a large collection of Facebook posts shows that the tasks of emotion classification and reaction prediction can both benefit from their interaction.
Recognizing facial expressions from static images or video sequences is a widely studied but still challenging problem. The recent progresses obtained by deep neural architectures, or by ensembles of heterogeneous models, have shown that integrating multiple input representations leads to state-of-the-art results. In particular, the appearance and the shape of the input face, or the representations of some face parts, are commonly used to boost the quality of the recognizer. This paper investigates the application of Convolutional Neural Networks (CNNs) with the aim of building a versatile recognizer of expressions in static images that can be further applied to video sequences. We first study the importance of different face parts in the recognition task, focussing on appearance and shape-related features. Then we cast the learning problem in the Semi-Supervised setting, exploiting video data, where only a few frames are supervised. The unsupervised portion of the training data is used to enforce two types of coherence, namely temporal coherence and coherence among the predictions on the face parts. Our experimental analysis shows that coherence constraints can improve the quality of the expression recognizer, thus offering a suitable basis to profitably exploit unsupervised video sequences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.