Learning to disentangle emotion factors for facial expression recognition in the wild

Zhu, Qingsheng; Gao, Lijian; Song, Heping; Mao, Qirong

doi:10.1002/int.22391

Cited by 11 publications

(7 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, reference exemplar-based algorithms based on unsupervised disentanglement learning [107][108][109] are becoming a promising research direction. Compared with only manually changing the attribute vector, this method directly learns the image-to-image translation along with the attributes, and then manipulates these attributes using a simple traversal across regularization dimensions, so that images with more realistic details can be generated.…”

Section: Discussionmentioning

confidence: 99%

Optimizing Few-Shot Learning Based on Variational Autoencoders

Wei

Mahmood

2021

Entropy

View full text Add to dashboard Cite

Despite the importance of few-shot learning, the lack of labeled training data in the real world makes it extremely challenging for existing machine learning methods because this limited dataset does not well represent the data variance. In this research, we suggest employing a generative approach using variational autoencoders (VAEs), which can be used specifically to optimize few-shot learning tasks by generating new samples with more intra-class variations on the Labeled Faces in the Wild (LFW) dataset. The purpose of our research is to increase the size of the training dataset using various methods to improve the accuracy and robustness of the few-shot face recognition. Specifically, we employ the VAE generator to increase the size of the training dataset, including the basic and the novel sets while utilizing transfer learning as the backend. Based on extensive experimental research, we analyze various data augmentation methods to observe how each method affects the accuracy of face recognition. The face generation method based on VAEs with perceptual loss can effectively improve the recognition accuracy rate to 96.47% using both the base and the novel sets.

show abstract

Section: Discussionmentioning

confidence: 99%

Optimizing Few-Shot Learning Based on Variational Autoencoders

Wei

Mahmood

2021

Entropy

View full text Add to dashboard Cite

show abstract

“…ResNet [34] 2016 72.40 CPC [36] 2018 71.35 SHCNN [37] 2019 69.10 Fa-Net [38] 2019 71.10 BReG-NeXt-50 [39] 2020 71.53 DisEmoNet [40] 2021 71.72 VGGNet [41] 2021 73.28 Landmark-guided GCNN [35] 2022 73.26 Ours 2022 74.23 FERPLUS Comparison: Table 2 displays the outcomes of a comparison of this paper's approach using the FERPLUS dataset with other state-of-the-art techniques. We compared our model with other CNN methods, such as ResNet+VGG [42], SENet [43], SHCNN [37], RAN [19], VTFF [44], ADC-Net [45], and the latest methods CERN [46] and A-MoblieNet [47].…”

Section: Methodsmentioning

confidence: 99%

“…Year Accuracy (%) gACNN [18] 2018 85.07 APM-VGG [57] 2019 85.17 MA-Net [56] 2020 88.42 DisEmoNet [40] 2020 83.78 RAN [19] 2020 86.90…”

Section: Methodsmentioning

confidence: 99%

Facial Expression Recognition Methods in the Wild Based on Fusion Feature of Attention Mechanism and LBP

Lei

Lin

et al. 2023

Sensors

View full text Add to dashboard Cite

Facial expression methods play a vital role in human–computer interaction and other fields, but there are factors such as occlusion, illumination, and pose changes in wild facial recognition, as well as category imbalances between different datasets, that result in large variations in recognition rates and low accuracy rates for different categories of facial expression datasets. This study introduces RCL-Net, a method of recognizing wild facial expressions that is based on an attention mechanism and LBP feature fusion. The structure consists of two main branches, namely the ResNet-CBAM residual attention branch and the local binary feature (LBP) extraction branch (RCL-Net). First, by merging the residual network and hybrid attention mechanism, the residual attention network is presented to emphasize the local detail feature information of facial expressions; the significant characteristics of facial expressions are retrieved from both channel and spatial dimensions to build the residual attention classification model. Second, we present a locally improved residual network attention model. LBP features are introduced into the facial expression feature extraction stage in order to extract texture information on expression photographs in order to emphasize facial feature information and enhance the recognition accuracy of the model. Lastly, experimental validation is performed using the FER2013, FERPLUS, CK+, and RAF-DB datasets, and the experimental results demonstrate that the proposed method has superior generalization capability and robustness in the laboratory-controlled environment and field environment compared to the most recent experimental methods.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Motivated by the achievements of emotional conversion in voice [33, 34] and face expression [35], we propose the emotional gait conversion approach to transform natural gaits into emotional gaits by separating identity and emotion representations for data augmentation. The contributions of this work can be summarized as follows: We introduce a MTL discriminator for gait identity and emotion joint learning, which takes into account nonverbal communication clues to enhance HRI. We propose a novel emotional gait conversion model with adversarial loss and cycle consistency loss to realize the mutual transformation between natural gait and emotional gait. We propose two kinds of data augmentation strategies by the emotional conversion model to increase the amount and diversity of the existing restricted dataset. We present an augmented synthetic dataset of human emotional gait, validated by a multitask classifier and achieved a corresponding 2.1% and 6.8% absolute increase in identity recognition and emotion recognition, respectively. …”

Section: Introductionmentioning

confidence: 99%

Data augmentation by separating identity and emotion representations for emotional gait recognition

Sheng

2023

Robotica

View full text Add to dashboard Cite

Human-centered intelligent human–robot interaction can transcend the traditional keyboard and mouse and have the capacity to understand human communicative intentions by actively mining implicit human clues (e.g., identity information and emotional information) to meet individuals’ needs. Gait is a unique biometric feature that can provide reliable information to recognize emotions even when viewed from a distance. However, the insufficient amount and diversity of training data annotated with emotions severely hinder the application of gait emotion recognition. In this paper, we propose an adversarial learning framework for emotional gait dataset augmentation, with which a two-stage model can be trained to generate a number of synthetic emotional samples by separating identity and emotion representations from gait trajectories. To our knowledge, this is the first work to realize the mutual transformation between natural gait and emotional gait. Experimental results reveal that the synthetic gait samples generated by the proposed networks are rich in emotional information. As a result, the emotion classifier trained on the augmented dataset is competitive with state-of-the-art gait emotion recognition works.

show abstract

Learning to disentangle emotion factors for facial expression recognition in the wild

Cited by 11 publications

References 44 publications

Optimizing Few-Shot Learning Based on Variational Autoencoders

Optimizing Few-Shot Learning Based on Variational Autoencoders

Facial Expression Recognition Methods in the Wild Based on Fusion Feature of Attention Mechanism and LBP

Data augmentation by separating identity and emotion representations for emotional gait recognition

Contact Info

Product

Resources

About