2022
DOI: 10.1109/access.2022.3174215
|View full text |Cite
|
Sign up to set email alerts
|

VAE-Based Adversarial Multimodal Domain Transfer for Video-Level Sentiment Analysis

Abstract: Video-level sentiment analysis is a challenging task and requires systems to obtain discriminative multimodal representations that can capture difference in sentiments across various modalities. However, due to diverse distributions of various modalities and the unified multimodal labels are not always adaptable to unimodal learning, the distance difference between unimodal representations increases, and prevents systems from learning discriminative multimodal representations. In this paper, to obtain more dis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…1. MOSEI drops the data lacking modalities to fairly evaluate recent modality fusion-based methods [20]. We compared the video segment IDs of each data point for each modality and saved only the data points associated with a common segment ID.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…1. MOSEI drops the data lacking modalities to fairly evaluate recent modality fusion-based methods [20]. We compared the video segment IDs of each data point for each modality and saved only the data points associated with a common segment ID.…”
Section: Methodsmentioning
confidence: 99%
“…Existing works on multimodal transfer learning unify adversarial learning to regularize the embedding distributions between different modalities, leading to effective multimodal fusion [14], [17], [18], [19], [20]. However, conventional systems are typically built on the assumption that all modalities exist, and the lack of modalities always leads to poor inference performance.…”
Section: Introductionmentioning
confidence: 99%
“…It has been suggested to use a variational autoencoder (VAE) enabled adversarial multimodal domain transfer (VAE-AMDT) to acquire more discriminating multimodal representation that also can enhance performance of the system, and it has been jointly trained with a multiattention mechanism to decrease the distance difference amongst the unimodal representation [37]. The sentiment feature set was employed to train and predict sentiment classification labels in a convolution neural network (CNN) and a bidirectional long short-term memory (BiLSTM) network [38].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Different from previous studies [22], [23], our work focuses on learning joint representations and multimodal interaction. Joint representations can correlate information from different modalities, thus learning joint representations facilitates the process of fusion.…”
Section: B Multimodal Representation Learningmentioning
confidence: 99%