2021
DOI: 10.1609/aaai.v35i12.17289
|View full text |Cite
|
Sign up to set email alerts
|

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Abstract: Representation Learning is a significant and challenging task in multimodal learning. Effective modality representations should contain two parts of characteristics: the consistency and the difference. Due to the unified multimodal annota- tion, existing methods are restricted in capturing differenti- ated information. However, additional unimodal annotations are high time- and labor-cost. In this paper, we design a la- bel generation module based on the self-supervised learning strategy to acquire independent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
99
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 300 publications
(100 citation statements)
references
References 20 publications
0
99
0
1
Order By: Relevance
“…Early fusion methods that concatenate multimodal data at the input level, such as [10], [12], can learn inter-modality dynamics, but they have limitations in learning intra-modality dynamics. Late fusion methods that integrate different modalities at the prediction level, such as [13], [14], focus on modeling intra-modality dynamics rather than inter-modality dynamics. Some studies introduced additional modules to model both intra-modality and inter-modality dynamics.…”
Section: Related Work a Multimodal Sentiment Analysismentioning
confidence: 99%
See 4 more Smart Citations
“…Early fusion methods that concatenate multimodal data at the input level, such as [10], [12], can learn inter-modality dynamics, but they have limitations in learning intra-modality dynamics. Late fusion methods that integrate different modalities at the prediction level, such as [13], [14], focus on modeling intra-modality dynamics rather than inter-modality dynamics. Some studies introduced additional modules to model both intra-modality and inter-modality dynamics.…”
Section: Related Work a Multimodal Sentiment Analysismentioning
confidence: 99%
“…MISA [7] introduced a modality-invariant encoder that projects each modality to a common space and used Central Moment Discrepancy (CMD) [8] as a similarity loss to align cross-modal representations within the common space. Self-MM [14] presented a label generation module based on self-supervised learning to acquire independent unimodal supervisions. There were also studies focused on changes in word representations.…”
Section: Related Work a Multimodal Sentiment Analysismentioning
confidence: 99%
See 3 more Smart Citations