2022
DOI: 10.48550/arxiv.2201.05834
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Abstract: Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels, which neglects the diversity of each modality and fails to capture richer semantic information for each label from different perspectives. Besides, associated relationships of modalities and labels have not been fully expl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…39.8 45.0 37.1 57.1 RAVEN 40.3 51.1 63.3 42.9 HHMPN 43.4 52.8 59.1 47.6 TAILOR (Zhang et al, 2022) 46.0 52.9 63.9 45.2 SIMM 41.8 48.4 48.2 48.6 ML-GCN 43 We list performance of models leveraging alignment information in Table 11. i-Code, w/o using the alignment information between transcripts and audio, can outperform many baseline models using that information.…”
Section: A4 Results Of Using Other Single-modality Encodersmentioning
confidence: 99%
See 2 more Smart Citations
“…39.8 45.0 37.1 57.1 RAVEN 40.3 51.1 63.3 42.9 HHMPN 43.4 52.8 59.1 47.6 TAILOR (Zhang et al, 2022) 46.0 52.9 63.9 45.2 SIMM 41.8 48.4 48.2 48.6 ML-GCN 43 We list performance of models leveraging alignment information in Table 11. i-Code, w/o using the alignment information between transcripts and audio, can outperform many baseline models using that information.…”
Section: A4 Results Of Using Other Single-modality Encodersmentioning
confidence: 99%
“…i-Code, w/o using the alignment information between transcripts and audio, can outperform many baseline models using that information. (Hazarika et al, 2020) 39.8 45.0 37.1 57.1 RAVEN 40.3 51.1 63.3 42.9 MuIT (Tsai et al, 2019) 42.3 52.3 63.6 44.5 HHMPN 43.4 52.8 59.1 47.6 TAILOR (Zhang et al, 2022) 46.0 52.9 63.9 45.2 SIMM 41 41.6 51.7 58.8 46.1 MuIT (Tsai et al, 2019) 44.5 53.1 61.9 46.5 MISA (Hazarika et al, 2020) 43.0 50.9 45.3 57.1 HHMPN 45.9 55.9 60.2 49.6 SIMM 43.2 52.5 56.1 49.5 ML-GCN 41.1 50.9 54.6 47.6…”
Section: A4 Results Of Using Other Single-modality Encodersmentioning
confidence: 99%
See 1 more Smart Citation