Emotion recognition is an attractive and essential topic in image and signal processing. In this paper, we propose a multi-level fusion method to combine visual information and physiological signals for emotion recognition. For visual information, we propose a serial fusion of two-stage features to enhance the representation of facial expression in a video sequence. We propose to integrate the Neural Aggregation Network with Convolutional Neural Network feature map to reinforce the vital emotional frames. For physiological signals, we propose a parallel fusion scheme to widen the band of the annotation of the electroencephalogram signals. We extract the frequency feature with the Linear-Frequency Cepstral Coefficients and enhance it with the signal complexity denoted by Sample Entropy (SampEn). In the classification stage, we realize both feature level and decision level fusion of both visual and physiological information. Experimental results validate the effectiveness of the proposed multi-level multi-modal feature representation method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.