Affective computing is a key research topic in artificial intelligence which is applied to psychology and machines. It consists of the estimation and measurement of human emotions. A person’s body language is one of the most significant sources of information during job interview, and it reflects a deep psychological state that is often missing from other data sources. In our work, we combine two tasks of pose estimation and emotion classification for emotional body gesture recognition to propose a deep multi-stage architecture that is able to deal with both tasks. Our deep pose decoding method detects and tracks the candidate’s skeleton in a video using a combination of depthwise convolutional network and detection-based method for 2D pose reconstruction. Moreover, we propose a representation technique based on the superposition of skeletons to generate for each video sequence a single image synthesizing the different poses of the subject. We call this image: ‘history pose image’, and it is used as input to the convolutional neural network model based on the Visual Geometry Group architecture. We demonstrate the effectiveness of our method in comparison with other methods in the state of the art on the standard Common Object in Context keypoint dataset and Face and Body gesture video database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.