Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge 2022
DOI: 10.1145/3551876.3554801
|View full text |Cite
|
Sign up to set email alerts
|

Transformer-based Non-Verbal Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0
2

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 20 publications
0
2
0
2
Order By: Relevance
“…Baseline (FAU) [1] .2840 (.2828 ± .0016) .2801 (.2777 ± .0017) Baseline (VGGFACE 2) [1] .2488 (.2441 ± .0027) .1830 (.1985 ± .0088) Resnet-18 [14] .3893 (--) --(--) Former-DFER+MLGCN [16] .3454 (--) --(--) ViPER [15] .2978 (--) .2859 (--) FaceRNET [6] .3590 (--) .3607 (--)…”
Section: Videounclassified
See 1 more Smart Citation
“…Baseline (FAU) [1] .2840 (.2828 ± .0016) .2801 (.2777 ± .0017) Baseline (VGGFACE 2) [1] .2488 (.2441 ± .0027) .1830 (.1985 ± .0088) Resnet-18 [14] .3893 (--) --(--) Former-DFER+MLGCN [16] .3454 (--) --(--) ViPER [15] .2978 (--) .2859 (--) FaceRNET [6] .3590 (--) .3607 (--)…”
Section: Videounclassified
“…Baseline [1] .2382 (.2350 ± 0.0016) .2029 (.2014 ± .0086) Resnet-18 + DEEPSPECTRUM [14] .3968 (--) --(--) ViPER [15] .3025 (--) .2970 (--)…”
Section: Multimodalunclassified
“…As an alternative vision-based strategy, we employ the DINO-trained ViT , which has been pretrained on the ImageNet-1K dataset in a self-supervised manner using the self-distillation with no labels (DINO) method [14]. This model has demonstrated its efficacy for various image-based tasks, including emotion recognition from facial expressions [16,58]. The model processes the extracted facial images and outputs a 384dimensional embedding for each image.…”
Section: Vision Transformer (Vit )mentioning
confidence: 99%
“…Former-DFER+MLGCN [32] also proposes an end-to-end network framework that has achieved remarkable performance on the Hume-Reaction dataset. To better utilize multimodal information, ViPER [30] proposed a multimodal architecture for emotion recognition which achieves excellent performance on the ERI task as well.…”
Section: Emotional Reaction Intensitymentioning
confidence: 99%