Multimodal learning for facial expression recognition

Zhang, Wei; Zhang, Youmei; Ma, Lin; Guan, Jingwei; Gong, Shijie

doi:10.1016/j.patcog.2015.04.012

Cited by 95 publications

(25 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our algorithm, LBP features are extracted and processed before they are sent into the classifiers. For most papers, researchers extract features from images and normalize the data to 0-1, such as [23] normalizes the data to 0-1. For our algorithm framework, the images data are normalized to 0-1, by using this method a better result can be gained.…”

Section: Lbp Correctionmentioning

confidence: 99%

“…Different features describe different characters of the images, therefore researchers used some features merging together to apply the superiority of all the features [9,23]. For our algorithm, LBP and HOG descriptors are applied to utilize the texture and oritension information of these expressions.…”

Section: Hog Processing and Features Fusionmentioning

confidence: 99%

“…The recognition rate of the individual feature and fusion features will be shown in the experiment section. Zhang et al [23] applied a structured regularization(SR) method which is employed to enforce and learn the modality specific sparsity and density of each modality respectively. As for our algorithm, the single features are firstly processed to their best performance and then they are normalized to the same scale.…”

Section: Hog Processing and Features Fusionmentioning

confidence: 99%

“…Especially these images extended in 2010 have different pixels and two types of pixels which are 640x490 and 640x480 are in the database. In order to compare with other methods [2,3,10,23], our experiments use these 309 sequences in the 327 sequences without contempt. As the operation in [2,3], the first image (the neutral) and the last three peak frame are chosen for training and testing.…”

Section: Ck+ Databasementioning

confidence: 99%

See 3 more Smart Citations

Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas

Liu

et al. 2017

Preprint

View full text Add to dashboard Cite

Abstract:In pattern recognition domain, deep architectures are widely used nowadays and they have achieved fine grades. However, these deep architectures need special demands, especially big datasets and GPU. Aiming to gain better grades without deep networks, we propose a simplified algorithm framework using fusion features extracted from the salient areas of faces. Furthermore, the proposed algorithm has achieved a better result than some deep architectures. For extracting more effective features, this paper firstly defines the salient areas on the faces. This paper normalizes the salient areas of the same location in the faces to the same size, therefore it can gain more similar features from different subjects. LBP and HOG features are extracted from the salient areas, fusion features' dimensions are reduced by Principal Component Analysis( PCA) and we apply softmax to classify the six basic expressions at once. This paper proposes a salient areas definitude method which uses peak expressions frames to compare with their neutral faces. This paper also proposes and applies the idea of normalizing the salient areas to align the specific areas which express the different expressions. This makes the salient areas found from different subjects have the same size. Besides, gamma correction method is firstly applied on LBP features in our algorithm framework which improves our recognition rates significantly. By applying this algorithm framework, our research has gained state-of-the-art performances on CK+ database and JAFFE database.

show abstract

Section: Lbp Correctionmentioning

confidence: 99%

Section: Hog Processing and Features Fusionmentioning

confidence: 99%

Section: Hog Processing and Features Fusionmentioning

confidence: 99%

Section: Ck+ Databasementioning

confidence: 99%

See 2 more Smart Citations

Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas

Liu

et al. 2017

Preprint

View full text Add to dashboard Cite

show abstract

“…In [17], video and audio modality inputs were employed to learn bimodal deep belief networks (DBN). In [18], multimodal deep neural networks (DNN) were proposed to study the correlation between texture and landmark modalities for facial expression reorganization, wherein several stacked autoencoders (AE) were used. In [19], bimodal DNN were used to determine driver fatigue expression.…”

Section: Introductionmentioning

confidence: 99%

Coal-Rock Recognition in Top Coal Caving Using Bimodal Deep Learning and Hilbert-Huang Transform

Zhang¹,

Zeng-cai²,

Zhao³

et al. 2017

Shock and Vibration

View full text Add to dashboard Cite

This study employs the mechanical vibration and acoustic waves of a hydraulic support tail beam for an accurate and fast coal-rock recognition. The study proposes a diagnosis method based on bimodal deep learning and Hilbert-Huang transform. The bimodal deep neural networks (DNN) adopt bimodal learning and transfer learning. The bimodal learning method attempts to learn joint representation by considering acceleration and sound pressure modalities, which both contribute to coal-rock recognition. The transfer learning method solves the problem regarding DNN, in which a large number of labeled training samples are necessary to optimize the parameters while the labeled training sample is limited. A suitable installation location for sensors is determined in recognizing coal-rock. The extraction features of acceleration and sound pressure signals are combined and effective combination features are selected. Bimodal DNN consists of two deep belief networks (DBN), each DBN model is trained with related samples, and the parameters of the pretrained DBNs are transferred to the final recognition model. Then the parameters of the proposed model are continuously optimized by pretraining and fine-tuning. Finally, the comparison of experimental results demonstrates the superiority of the proposed method in terms of recognition accuracy.

show abstract

RF‐GCN: Residual fused‐graph convolutional network using multimodalities for facial emotion recognition

Sakthi,

Ezhumalai

2024

Trans Emerging Tel Tech

View full text Add to dashboard Cite

BackgroundThe emotional state of individuals is difficult to identify and it is developing now a days because of vast interest in recognition. Many technologies have been developed to identify this emotional expression based on facial expressions, vocal expressions, physiological signals, and body expressions. Among these, facial emotion is very expressive for recognition using multimodalities. Understanding facial emotions has applications in mental well‐being, decision‐making, and even social change, as emotions play a crucial role in our lives. This recognition is complicated by the high dimensionality of data and non‐linear interactions across modalities. Moreover, the way emotion is expressed by people varies and these feature identification remains challenging, where these limitations are overcome by Deep learning models.MethodsThis research work aims at facial emotion recognition through the utilization of a deep learning model, named the proposed Residual Fused‐Graph Convolution Network (RF‐GCN). Here, multimodal data included is video as well as an Electroencephalogram (EEG) signal. Also, the Non‐Local Means (NLM) filter is used for pre‐processing input video frames. Here, the feature selection process is carried out using chi‐square, after feature extraction, which is done in both pre‐processed video frames and input EEG signals. Finally, facial emotion recognition and its types are determined by RF‐GCN, which is a combination of both the Deep Residual Network (DRN) and Graph Convolutional Network (GCN).ResultsFurther, RF‐GCN is evaluated for performance by metrics such as accuracy, recall, and precision, with superior values of 91.6%, 96.5%, and 94.7%.ConclusionsRF‐GCN captures the nuanced relationships between different emotional states and improves recognition accuracy. The model is trained and evaluated on the dataset and reflects real‐world conditions.

show abstract

Multimodal learning for facial expression recognition

Cited by 95 publications

References 20 publications

Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas

Facial Expression Recognition with Fusion Features Extracted from Salient Facial Areas

Coal-Rock Recognition in Top Coal Caving Using Bimodal Deep Learning and Hilbert-Huang Transform

RF‐GCN: Residual fused‐graph convolutional network using multimodalities for facial emotion recognition

Contact Info

Product

Resources

About