2018
DOI: 10.48550/arxiv.1806.00064
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
88
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(89 citation statements)
references
References 21 publications
1
88
0
Order By: Relevance
“…Thus, how to aggregate information from multi-modal features is the main problem. Multi-modal data or features fusion across different modalities is always a hot topic, some classical methods [27]- [29] were proposed to utilize the linear embedding or attention mechanism to fuse multi-modal features. For example, Hori et al [28] propose a multi-modal attention model that selectively fuses multi-modal features based on learned attention.…”
Section: Multi-modal Fusionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, how to aggregate information from multi-modal features is the main problem. Multi-modal data or features fusion across different modalities is always a hot topic, some classical methods [27]- [29] were proposed to utilize the linear embedding or attention mechanism to fuse multi-modal features. For example, Hori et al [28] propose a multi-modal attention model that selectively fuses multi-modal features based on learned attention.…”
Section: Multi-modal Fusionmentioning
confidence: 99%
“…The key to successful fusion is how to reinforce the discriminative information while suppressing the irrelevant information among multi-modal features. To this end, some works [27]- [29] propose to utilize the linear attention to selectively fuse multi-modal features. For example, Hori et al [28] propose a multi-modal attention model that selectively fuses multi-modal features based on different attention factors.…”
Section: Introductionmentioning
confidence: 99%
“…for images ) which allows us to leverage wider modality specific information and b) often but not always each individual modality is in principle enough to correctly predict the output Plethora of neural architectures have been proposed to learn multimodal representations for sentiment classification. Models often rely on a fusion mechanism (Khan et al 2012), tensor factorisation (Liu et al 2018;Zadeh et al 2019) or complex attention mechanisms (Zadeh et al 2018a) that is fed with modality specific representations.…”
Section: Multimodal Fusionmentioning
confidence: 99%
“…feature rich yet efficient representations (Zadeh et al 2017;Liu et al 2018;Hazarika, Zimmermann, and Poria 2020). Recently (Rahman et al 2020) used pre-trained transformer (Tsai et al 2019;Siriwardhana et al 2020) based models to achieve state-of the-art results on multimodal sentiment benchmark MOSI (Wöllmer et al 2013) and MOSEI (Zadeh et al 2018c).…”
Section: Introductionmentioning
confidence: 99%
“…While TFN conducts numerous dot product operations in feature space resulting in an increase in computation. Therefore, Liu et al [5] proposed Low-rank Multimodal Fusion (LMF) based on TFN and improved the calculation efficiency by decomposing the high-order tensors. Apart from manipulating geometric property, auxiliary loss is used to aid modal fusion.…”
Section: Introductionmentioning
confidence: 99%