2019
DOI: 10.1109/taffc.2017.2695999
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Objective Based Spatio-Temporal Feature Representation Learning Robust to Expression Intensity Variations for Facial Expression Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
117
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 196 publications
(119 citation statements)
references
References 37 publications
1
117
0
1
Order By: Relevance
“…• Learning temporal feature representations. To learn representations of the temporal dynamics found in audio [63], sequences of images [64], and physiological measurements [55], DNNs and especially RNNs are successfully applied.…”
Section: Towards Learning Deep Models Of Affectmentioning
confidence: 99%
See 1 more Smart Citation
“…• Learning temporal feature representations. To learn representations of the temporal dynamics found in audio [63], sequences of images [64], and physiological measurements [55], DNNs and especially RNNs are successfully applied.…”
Section: Towards Learning Deep Models Of Affectmentioning
confidence: 99%
“…More recent studies combine RNNs with deep methods for spatial feature learning discussed in Section 3.1.1, by adopting deep features from the last layer of a CNN trained for affect recognition (e.g., [106], [18], [64]). We see both CNN-RNN (e.g., [18], [90]) and CNN-LSTM (e.g., [108], [105], [64]) architectures, with CNN-LSTM being the more frequent choice among the reviewed studies. Global temporal modeling is found to lead to improved accuracies when compared with simpler methods such as pooling of spatial features (e.g., [152], [90], [108]).…”
Section: Learning Temporal Features For Fermentioning
confidence: 99%
“…Face alignment is a traditional pre-processing step in many facerelated recognition tasks. We list some well-known approaches [16], [63] 3000 fps [64] 68 [55] Incremental [65] 49 [66] Deep learning cascaded CNN [67] 5 fast good/ very good [68] MTCNN [69] 5 [70], [71] and publicly available implementations that are widely used in deep FER. Given a series of training data, the first step is to detect the face and then to remove background and non-face areas.…”
Section: Face Alignmentmentioning
confidence: 99%
“…Cascaded networks: By combining the powerful perceptual vision representations learned from CNNs with the strength of LSTM for variable-length inputs and outputs, Donahue et al [204] proposed a both spatially and temporally deep model which cascades the outputs of CNNs with LSTMs for various vision tasks involving time-varying inputs and outputs. Similar to this hybrid network, many cascaded networks have been proposed for FER (e.g., [66], [108], [190], [205]).…”
Section: Rnn and C3dmentioning
confidence: 99%
“…Recently, a few research efforts have been made regarding facial dynamic feature encoding for a facial analysis [9,25,6,24].It is generally known that the dynamic features of local regions are valuable for facial trait estimation [9,6]. Usually, the motion of facial local region in facial expression is related to the motion of other facial regions [39,43].…”
Section: Introductionmentioning
confidence: 99%