2020
DOI: 10.1609/aaai.v34i01.5347
|View full text |Cite
|
Sign up to set email alerts
|

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Abstract: Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
80
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 149 publications
(81 citation statements)
references
References 33 publications
0
80
0
1
Order By: Relevance
“…To address it, Liu et al (2018) presented the Efficient Low-rank Multimodal Fusion, which applies multimodal fusion using low-rank tensors to accelerate the fusion process. Mai et al (2020) proposed a graph fusion network to model unimodal, bimodal, and trimodal interactions successively.…”
Section: Related Workmentioning
confidence: 99%
“…To address it, Liu et al (2018) presented the Efficient Low-rank Multimodal Fusion, which applies multimodal fusion using low-rank tensors to accelerate the fusion process. Mai et al (2020) proposed a graph fusion network to model unimodal, bimodal, and trimodal interactions successively.…”
Section: Related Workmentioning
confidence: 99%
“…In Table 2, we only group some metrics associated with the IEMOCAP dataset, since most of the authors have tested their work on that collection. Some works present results for other datasets such as CMU-MOSEI (e.g., [63,66,84]) or REVOLA (e.g., [57]), but these works can be only grouped in small sets, and thus we think that the best overview of the field architectures can only be seen in an IEMOCAP comparison.…”
Section: Aggregated Reported Resultsmentioning
confidence: 99%
“…CMU-MOSEI includes labels not only for the emotion recognition task but also for the (text-based) sentiment analysis problem. For this reason, it is mostly used for sentiment analysis architectures (e.g., [63,66,67,70,[83][84][85]). However, its benefits must be adopted by researchers on emotion recognition in order to produce more robust models and gain a better insight into their actual performance.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…An adversarial autoencoder was proposed to match the posterior vector of the hidden distribution with the prior distribution, and the domainadversarial training of the neural network was proposed to improve the performance of the target domain [28]. These techniques can also be applied to multimodal sentiment analysis research to improve the performance of models [29].…”
Section: B Heterogeneous Transfer Learningmentioning
confidence: 99%