2021 IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
DOI: 10.1109/wacv48630.2021.00307
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…Some later studies [18,19] further proved that the models using the CNN-RNN scheme are practical spatial-temporal feature extractors. In addition, independent 3D CNN architecture and its variants also performed competitively in video-based FER tasks [20,21].…”
Section: Video-based Facial Expression Recognitionmentioning
confidence: 99%
“…Some later studies [18,19] further proved that the models using the CNN-RNN scheme are practical spatial-temporal feature extractors. In addition, independent 3D CNN architecture and its variants also performed competitively in video-based FER tasks [20,21].…”
Section: Video-based Facial Expression Recognitionmentioning
confidence: 99%
“…Although these methods perform well by selecting peak frames, they neglect the temporal dynamics and correlation among facial frames. Different from static frame-based methods, dynamic sequence based methods usually use 3D convolution neural networks (3DCNN) [7], long-short term memory (LSTM) [8] to learn the spatiotemporal relationships, which can model long-term dependencies and improve the performance of DFER. The performances of these methods are still far from being satisfactory, because of occlusions, variant head poses, poor illumination and other unexpected issues in real-world scenes.…”
Section: Introductionmentioning
confidence: 99%
“…Different from static frame-based methods, dynamic sequence based methods usually use 3D convolution neural networks (3DCNN) [2], long-short term memory (LSTM) [54] to learn the spatio-temporal relationships, which can model long-term dependencies and improve the performance of DFER. For instance, Kim et al [23] propose to use a LSTM network to learn the temporal characteristics of the spatial features, which achieves higher recognition rates.…”
Section: Introductionmentioning
confidence: 99%
“…Although these methods employ the 3D CNN to capture the temporal correlation, they only use one or two layers with 3D convolutional filters and fail to model the complicated spatio-temporal correlation. Very recently, Ayral et al [2] propose to re-weight different clips and achieve a clip-aware 3D CNN for dynamic facial expression, which outperforms other methods on AFEW. A novel framework called EC-STFL [21] is also proposed to reduce the inter-class distance and enhance the intraclass correlation.…”
Section: Introductionmentioning
confidence: 99%