In this paper, we propose a novel deep learning framework, called spatial-temporal recurrent neural network (STRNN), to integrate the feature learning from both spatial and temporal information of signal sources into a unified spatial-temporal dependency model. In STRNN, to capture those spatially co-occurrent variations of human emotions, a multidirectional recurrent neural network (RNN) layer is employed to capture long-range contextual cues by traversing the spatial regions of each temporal slice along different directions. Then a bi-directional temporal RNN layer is further used to learn the discriminative features characterizing the temporal dependencies of the sequences, where sequences are produced from the spatial RNN layer. To further select those salient regions with more discriminative ability for emotion recognition, we impose sparse projection onto those hidden states of spatial and temporal domains to improve the model discriminant ability. Consequently, the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition. Experimental results on the public emotion datasets of electroencephalogram and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.
To explore human emotions, in this paper, we design and build a multi-modal physiological emotion database, which collects four modal physiological signals, i.e., electroencephalogram (EEG), galvanic skin response, respiration, and electrocardiogram (ECG). To alleviate the influence of culture dependent elicitation materials and evoke desired human emotions, we specifically collect an emotion elicitation material database selected from more than 1500 video clips. By the considerable amount of strict man-made labeling, we elaborately choose 28 videos as standardized elicitation samples, which are assessed by psychological methods. The physiological signals of participants were synchronously recorded when they watched these standardized video clips that described six discrete emotions and neutral emotion. With three types of classification protocols, different feature extraction methods and classifiers (support vector machine and k-NearestNeighbor) were used to recognize the physiological responses of different emotions, which presented the baseline results. Simultaneously, we present a novel attention-long short-term memory (A-LSTM), which strengthens the effectiveness of useful sequences to extract more discriminative features. In addition, correlations between the EEG signals and the participants' ratings are investigated. The database has been made publicly available to encourage other researchers to use it to evaluate their own emotion estimation methods.
Micro-expression recognition aims to infer genuine emotions which people try to conceal from facial video clips. It is a very challenging task because micro-expressions have very low intensity and short duration, which makes micro-expressions difficult to observe. Recently, researchers have designed various spatiotemporal descriptors to describe micro-expressions. It is notable that for better capturing the low-intensity facial muscle movement, a fixed spatial division grid, 8 × 8 for example, is commonly used to partition the facial images into a few facial blocks before extracting descriptors. However, it is hard to choose an ideal division grid for different micro-expression samples because the division grids affect the discriminative ability of spatiotemporal descriptors to distinguish micro-expressions. To address this problem, in this paper we design a hierarchical spatial division scheme for spatiotemporal descriptor extraction. By using the proposed scheme, it would not be a problem to determine which division grid is most suitable regarding different micro-expression datasets. Furthermore, we propose a kernelized group sparse learning (KGSL) model to process hierarchical scheme based spatiotemporal descriptors such that they are more effective for micro-expression recognition tasks. To evaluate the performance of the proposed micro-expression recognition method consisting of the hierarchical scheme based spatiotemporal descriptors and KGSL, extensive experiments are conducted on two public micro-expression databases: CASME II and SMIC. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.