2021
DOI: 10.1109/mmul.2021.3080305
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Learning for Multimodal Emotion Recognition in Video With Adaptive Loss

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…For all emotion classes and on overall, our method achieves much better results than CAE-LR [25], showing the effectiveness of the contrastive loss in multimodal setting compared to convolutional autoencoders. It is worth noting that, on average, our method is better than several fully supervised techniques: MESM [21], FE2E [21], Graph-MFN [43], [51], CIA [46]. Considering that these methods integrate relatively complex supervised techniques; attention mechanisms, transformers, graphs, the better performance of our method is very promising.…”
Section: Comparisons With the State-of-the-art Methodsmentioning
confidence: 84%
See 1 more Smart Citation
“…For all emotion classes and on overall, our method achieves much better results than CAE-LR [25], showing the effectiveness of the contrastive loss in multimodal setting compared to convolutional autoencoders. It is worth noting that, on average, our method is better than several fully supervised techniques: MESM [21], FE2E [21], Graph-MFN [43], [51], CIA [46]. Considering that these methods integrate relatively complex supervised techniques; attention mechanisms, transformers, graphs, the better performance of our method is very promising.…”
Section: Comparisons With the State-of-the-art Methodsmentioning
confidence: 84%
“…There are a lot of attempts applying end-to-end learning [27], [26], [50], [51], but only [21] compared a fully end-toend method (defined as jointly optimizing feature extraction and feature learning stages [21]) with the two-phase pipelines (i.e., feature extraction is independent from multimodal learning). Indeed, it is very common in the MER litreature to apply the feature extraction step separately.…”
Section: Related Workmentioning
confidence: 99%
“…As large models gain prominence, word embeddings and pre-trained models have seen significant success in sentiment analysis. Word2Vec [24] maps semantically similar words to similar vector spaces, while GloVe [25] derives semantic relationships between words based on global co-occurrence. ELMo [26] introduces context-aware embeddings, allowing word representations to vary based on their specific contexts within sentences.…”
Section: Related Workmentioning
confidence: 99%