In this paper, we describe the results of the HSEmotion team in two tasks of the seventh Affective Behavior Analysis in-thewild (ABAW) competition, namely, multi-task learning for simultaneous prediction of facial expression, valence, arousal, and detection of action units, and compound expression recognition. We propose an efficient pipeline based on frame-level facial feature extractors pre-trained in multi-task settings to estimate valence-arousal and basic facial expressions given a facial photo. We ensure the privacy-awareness of our techniques by using the lightweight architectures of neural networks, such as MT-EmotiDDAMFN, MT-EmotiEffNet, and MT-EmotiMobileFaceNet, that can run even on a mobile device without the need to send facial video to a remote server. It was demonstrated that a significant step in improving the overall accuracy is the smoothing of neural network output scores using Gaussian or box filters. It was experimentally demonstrated that such a simple post-processing of predictions from simple blending of two top visual models improves the F1-score of facial expression recognition up to 7%. At the same time, the mean Concordance Correlation Coefficient (CCC) of valence and arousal is increased by up to 1.25 times compared to each model's frame-level predictions. As a result, our final performance score on the validation set from the multi-task learning challenge is 4.5 times higher than the baseline (1.494 vs 0.32).
In this paper we describe our algorithmic approach, which was used for submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017) group-level emotion recognition sub-challenge. We extracted feature vectors of detected faces using the Convolutional Neural Network trained for face identification task, rather than traditional pre-training on emotion recognition problems. In the final pipeline an ensemble of Random Forest classifiers was learned to predict emotion score using available training set. In case when the faces have not been detected, one member of our ensemble extracts features from the whole image. During our experimental study, the proposed approach showed the lowest error rate when compared to other explored techniques. In particular, we achieved 75.4% accuracy on the validation data, which is 20% higher than the handcrafted feature-based baseline. The source code using Keras framework is publicly available.
We explore the problems of classification of composite object (images, speech signals) with low number of models per class. We study the question of improving recognition performance for medium-sized database (thousands of classes). The key issue of fast approximate nearest-neighbor methods widely applied in this task is their heuristic nature. It is possible to strongly prove their efficiency by using the theory of algorithms only for simple similarity measures and artificially generated tasks. On the contrary, in this paper we propose an alternative, statistically optimal greedy algorithm. At each step of this algorithm joint density (likelihood) of distances to previously checked models is estimated for each class. The next model to check is selected from the class with the maximal likelihood. The latter is estimated based on the asymptotic properties of the Kullback-Leibler information discrimination and mathematical model of piecewise-regular object with distribution of each regular segment of exponential type. Experimental results in face recognition for FERET dataset prove that the proposed method is much more effective than not only brute force and the baseline (directed enumeration method) but also approximate nearest neighbor methods from FLANN and NonMetricSpaceLib libraries (randomized kd-tree, composite index, perm-sort).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.