2021
DOI: 10.1109/access.2021.3126782
|View full text |Cite
|
Sign up to set email alerts
|

Targeted Aspect-Based Multimodal Sentiment Analysis: An Attention Capsule Extraction and Multi-Head Fusion Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(9 citation statements)
references
References 19 publications
0
9
0
Order By: Relevance
“…Wang et al [41] designed an Attention Capsule Extraction and Multi-head Fusion Network (EF-Net) for MABSA, the basic framework of which is shown in Figure 5. EF-Net extracts image features using ResNet-152 and inputs them into a single layer capsule network to obtain the position information of the target in the image.…”
Section: ) Attention Mechanism-based Masc Methodsmentioning
confidence: 99%
“…Wang et al [41] designed an Attention Capsule Extraction and Multi-head Fusion Network (EF-Net) for MABSA, the basic framework of which is shown in Figure 5. EF-Net extracts image features using ResNet-152 and inputs them into a single layer capsule network to obtain the position information of the target in the image.…”
Section: ) Attention Mechanism-based Masc Methodsmentioning
confidence: 99%
“…Zhang et al (2021) [ 28 ] employed a pair of memory networks to capture intra-modal information and extract interaction information between different modalities, then designed a discriminative matrix to supervise the fusion of modal information. Gu et al (2021) [ 29 ] designed an Attention Capsule Extraction and Multi-Head Fusion Network for aspect-level multimodal sentiment classification. Through the integration of multi-head attention mechanisms and capsule networks, it captures interactions between multimodal inputs.…”
Section: Related Workmentioning
confidence: 99%
“…• EF-NET [41]: Uses a network based on Multi-Head Attention (MHA) and ResNet-152 to process text and images, respectively, and employs MHA to capture interactions between multimodal inputs.…”
Section: B Baselinesmentioning
confidence: 99%