When using neural networks to recognize facial expressions, determining which features help identify different expressions is essential, and there is a massive information transmission loss between layers of network with multiple layers. This paper proposes a robust vectorized convolutional neural network (CNN) model that introduces an attention mechanism for extracting features in the region of interests(ROIs) of the face. The ROIs in the facial image are marked before the image is input into the neural network. In particular, the attention concept is adopted in the first layer of the proposed neural network to perform ROIs-related convolution calculation, and ROIs-related convolution calculation results of the specific fields in the ROIs are increased by extracting more robust features. Next, the idea of features' vectors inspired by CapsNet is used in the following layer of the proposed neural network. Multi-level convolutions are used to extract feature vectors of different ROIs for facial expression, and then the feature vectors are reconstructed by a decoder to reconstruct the image. Comprehensive comparative experiments and cross-database experiments are conducted to verify the validity and robustness of our proposed model. The experimental results also demonstrate that our method is very effective in improving the performance of facial expressions recognition. INDEX TERMS Regions of interests, attention mechanism, facial expression recognition, vectored features, CapsNet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.