Proceedings of the 20th ACM International Conference on Multimodal Interaction 2018
DOI: 10.1145/3242969.3264980
|View full text |Cite
|
Sign up to set email alerts
|

An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

Abstract: This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest learning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations. ii) The visual temporal information is handled by a simple score-per-frame selection process, av… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(26 citation statements)
references
References 31 publications
0
26
0
Order By: Relevance
“…For an exact and fair comparison, the numerical values of the conventional methods were quoted directly from previous studies [ 27 , 29 , 37 , 58 , 59 , 60 , 61 ]. CNN-RNN-based techniques [ 27 , 29 , 60 ] and 2D CNN-based ones [ 37 , 58 , 59 , 61 ] were examined. Note that [ 37 ] used five-fold cross-validation jointly with a training set and validation set, so it could not be fairly compared with the others.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For an exact and fair comparison, the numerical values of the conventional methods were quoted directly from previous studies [ 27 , 29 , 37 , 58 , 59 , 60 , 61 ]. CNN-RNN-based techniques [ 27 , 29 , 60 ] and 2D CNN-based ones [ 37 , 58 , 59 , 61 ] were examined. Note that [ 37 ] used five-fold cross-validation jointly with a training set and validation set, so it could not be fairly compared with the others.…”
Section: Resultsmentioning
confidence: 99%
“…Among the VSHNN models without the FS module, C3D and SAGRU of type B showed higher performance than the previous methods. For example, its accuracy improved by around 2.87% compared to a SOTA method [ 61 ].…”
Section: Resultsmentioning
confidence: 99%
“…Accuracy CAKE [60] 68.9 DLP-CNN [58] 74.2 Vielzeuf et al [73] 80 PG-CNN [14] 83 patch augmentation. The performance of proposed methods achieve higher accuracy or close to the state-of-the-art methods; (4) A benefit of MFMP+ is observed in most in-the-lab expression datasets as they contain a small number of samples.…”
Section: Methodsmentioning
confidence: 99%
“…Source Accuracy RAF-DB EmotioNet NCMML [8] SIP(2016) 57.70% -Capsnet [4] arXiv(2017) 76.12 % 32.64% Boosting-POOF [6] FG(2017) 73.19% 46.27% MRE-CNN [1] ICANN(2017) 76.73% -VGG16 [5] CS(2014) 80.96% 45.59% RC-DLP [9] CVPR(2017) 84.70% -Emotion classifier [10] ICMI(2018) 80.00% -GAN-Inpainting [11] CVPR(2018) 81.87% -DLP-CNN [2] IEEE TIP(2019) 84.13% -FERAtt [7] arXiv(2019) -48.63% E2-Capsnet -85.24% 55.91% Fig. 3 Visualizations of Capsnet [4], VGG16 [5], Boosting-POOF [6], FERAtt [7] and E2-Capsnet on EmotioNet by T-SNE.…”
Section: Methodsmentioning
confidence: 99%