With the continued development of artificial intelligence (AI) technology, research on interaction technology has become more popular. Facial expression recognition (FER) is an important type of visual information that can be used to understand a human's emotional situation. In particular, the importance of AI systems has recently increased due to advancements in research on AI systems applied to AI robots. In this paper, we propose a new scheme for FER system based on hierarchical deep learning. The feature extracted from the appearance feature-based network is fused with the geometric feature in a hierarchical structure. The appearance feature-based network extracts holistic features of the face using the preprocessed LBP image, whereas the geometric feature-based network learns the coordinate change of action units (AUs) landmark, which is a muscle that moves mainly when making facial expressions. The proposed method combines the result of the softmax function of two features by considering the error associated with the second highest emotion (Top-2) prediction result. In addition, we propose a technique to generate facial images with neutral emotion using the autoencoder technique. By this technique, we can extract the dynamic facial features between the neutral and emotional images without sequence data. We compare the proposed algorithm with the other recent algorithms for CK+ and JAFFE dataset, which are typically considered to be verified datasets in the facial expression recognition. The tenfold cross validation results show 96.46% of accuracy in the CK+ dataset and 91.27% of accuracy in the JAFFE dataset. When comparing with other methods, the result of the proposed hierarchical deep network structure shows up to about 3% of the accuracy improvement and 1.3% of average improvement in CK+ dataset, respectively. In JAFFE datasets, up to about 7% of the accuracy is enhanced, and the average improvement is verified by about 1.5%. INDEX TERMS Artificial intelligence (AI), facial expression recognition (FER), emotion recognition, deep learning, LBP feature, geometric feature, convolutional neural network (CNN).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.