2021
DOI: 10.48550/arxiv.2102.04830
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Abstract: Representation Learning is a significant and challenging task in multimodal learning. Effective modality representations should contain two parts of characteristics: the consistency and the difference. Due to the unified multimodal annotation, existing methods are restricted in capturing differentiated information. However, additional uni-modal annotations are high time-and labor-cost. In this paper, we design a label generation module based on the self-supervised learning strategy to acquire independent unimo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 15 publications
0
21
0
Order By: Relevance
“…Dai et al (2021a) proposed a multi-task learning approach using weak supervision for multimodal emotion recognition. Yu et al (2021) proposed a way to fuse features from different modalities by combining self-supervised and multi-task learning. Although selfsupervised and multi-task learning can effectively alleviate the problem of small samples, how to perform efficient cross-modal interactions is still a tremendously challenging issue for researchers.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Dai et al (2021a) proposed a multi-task learning approach using weak supervision for multimodal emotion recognition. Yu et al (2021) proposed a way to fuse features from different modalities by combining self-supervised and multi-task learning. Although selfsupervised and multi-task learning can effectively alleviate the problem of small samples, how to perform efficient cross-modal interactions is still a tremendously challenging issue for researchers.…”
Section: Related Workmentioning
confidence: 99%
“…In the previous works (Sahay et al 2020;Rahman et al 2020;Hazarika, Zimmermann, and Poria 2020;Yu et al 2021;Dai et al 2021a), Transformers (Vaswani et al 2017) are mostly used for unaligned multimodal emotion recognition. Typically, Tsai et al (2019a) proposed the Multimodal Transformer (MulT) method to fuse information from different modalities in unaligned sequences without explicitly aligning the data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing methods to learn unified representations are grouped in two categories: through loss back-propagation or geometric manipulation in the feature spaces. The former only tunes the parameters based on back-propagated gradients from the task loss Tsai et al, 2019a;, reconstruction loss (Mai et al, 2020), or auxiliary task loss Yu et al, 2021). The latter additionally rectifies the spatial orientation of unimodal or multimodal representations by matrix decomposition (Liu et al, 2018) or Euclidean measure optimization (Sun et al, 2020;.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work (Liu and Zhang, 2012;Tian et al, 2020) mainly focused on text sentiment analysis and achieved promising results. Recently, with the development of short video applications, multimodal sentiment analysis has obtained more attention (Tsai et al, 2019;Yu et al, 2021) and a lots of datasets (Li et al, 2017;Poria et al, 2018; are proposed to advance its developments. However, current multimodal sentiment analysis datasets usually follow the traditional sentiment system (positive, neutral and negative ) or emotion system (happy, sad, surprise and so on), which is far from satisfactory especially for video recommendation scenario of application.…”
Section: Introductionmentioning
confidence: 99%