Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/751
|View full text |Cite
|
Sign up to set email alerts
|

Adapting BERT for Target-Oriented Multimodal Sentiment Classification

Abstract: As an important task in Sentiment Analysis, Target-oriented Sentiment Classification (TSC) aims to identify sentiment polarities over each opinion target in a sentence. However, existing approaches to this task primarily rely on the textual content, but ignoring the other increasingly popular multimodal data sources (e.g., images), which can enhance the robustness of these text-based models. Motivated by this observation and inspired by the recently proposed BERT architecture, we study Target-oriented Multimod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
85
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 145 publications
(85 citation statements)
references
References 12 publications
0
85
0
Order By: Relevance
“…The overall architecture of ZEN is shown in Figure 1, where the backbone model (character encoder) is BERT 4 (Devlin et al, 2018), enhanced by n-gram information represented by a multi-layer encoder. Since the basis of BERT is well explained in previous studies (Devlin et al, 2018;Yu and Jiang, 2019), in this paper, we focus on the details of ZEN, by explaining how n-grams are processed and incorporated into the character encoder.…”
Section: Zenmentioning
confidence: 99%
“…The overall architecture of ZEN is shown in Figure 1, where the backbone model (character encoder) is BERT 4 (Devlin et al, 2018), enhanced by n-gram information represented by a multi-layer encoder. Since the basis of BERT is well explained in previous studies (Devlin et al, 2018;Yu and Jiang, 2019), in this paper, we focus on the details of ZEN, by explaining how n-grams are processed and incorporated into the character encoder.…”
Section: Zenmentioning
confidence: 99%
“…It can be assumed that the use visual information in these systems, to a certain degree, compensates for the lack of genuine language understanding capabilities. Yu and Jiang (2019) reported that using both modalities together is more effective than using them individually with respect to the task of Target-Oriented Sentiment Classification, which determines the sentiment over different individuals, for instance people or places. For this task, the authors propose a neural network that combines the BERT architecture (Devlin et al, 2019) with a target attention mechanism and self-attention layers to model intra-and inter-modality alignments.…”
Section: Crossmodal Interaction Of Language and Visionmentioning
confidence: 99%
“…Comparison of using linguistic and visual information individually or together for Target-Oriented Sentiment Classification, evaluated byYu and Jiang (2019) on two publicly available data sets.…”
mentioning
confidence: 99%
“…In this study, we conducted sentiment classification task on memes by combining the features from images and text, then classified it as positive, negative, or neutral. By using memes as data, it is believed that we can obtained better sentiment classification results because the other data source such as images, can enhance the robustness of models (Yu and Jiang, 2019). The data that used were obtained from SemEval 2020 Task 8: Memotion Analysis (Sharma et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…The results show that the overall performances from combination of the features from text and images are better than only using image or text as features in both datasets. Another study conducted by Yu and Jiang (2019). They proposed a target-oriented multimodal using a pre-trained language representations, BERT architecture, for detecting the sentiment.…”
Section: Introductionmentioning
confidence: 99%