2022
DOI: 10.3390/electronics11111778
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism

Abstract: The alignment of information between the image and the question is of great significance in the visual question answering (VQA) task. Self-attention is commonly used to generate attention weights between image and question. These attention weights can align two modalities. Through the attention weight, the model can select the relevant area of the image to align with the question. However, when using the self-attention mechanism, the attention weight between two objects is only determined by the representation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…Here self-attention and guided attention are models that learn interactions between different parts of image and the question, enhancing the accuracy of the answer by learning the rich interactions between visual and language streams. The model proposed by (19) introduces a multi-hop attention alignment method that enriches surrounding information when using self-attention.…”
Section: Attention Based Approaches For Visual Question Answeringmentioning
confidence: 99%
See 1 more Smart Citation
“…Here self-attention and guided attention are models that learn interactions between different parts of image and the question, enhancing the accuracy of the answer by learning the rich interactions between visual and language streams. The model proposed by (19) introduces a multi-hop attention alignment method that enriches surrounding information when using self-attention.…”
Section: Attention Based Approaches For Visual Question Answeringmentioning
confidence: 99%
“…It enhances accuracy by utilizing a CNN that predicts bounding boxes for objects in an image. https://www.indjst.org/ (54) CPDR (55) , MulFA/UFSCAN (56) , Bilinear Graph (57) , AttReg (58) , AMAM (16) , Scene-text using PHOC (59) , MGRF (60) , Bottom-Up and Top-Down (61) , DCAMN (39) , Skill Concept (62) , PGM (63) , SR-OCE (64) , RAMEN (65) , CSST (66) , Coarse-to-Fine (67) , GMA (68) , BLOCK (69) , CapsAtt (32,40) , Re-attention (70) , CRN (71) , CAT (11) , shortcut (72) , DAQC (15) , MGFAN (73) , MMMH (19) , MSG (74) , Fair-VQA (75) , Attention map (5) , SAVQA (76) , MGAVQA (77) , MuKEA (78) , ACVRM (79) , QD-GFN (23) , Swap-Mix (80) , CVA (17) , HGNMN (26) , SUPER (37) , Uncertainty based (81) , CLG (82) , WSQG (83) , VLR (84) , LXMERT (85) , SceneGATE (86)…”
Section: Visual Feature Extraction Techniquesmentioning
confidence: 99%
“…Bahdanau used attention mechanisms to complete the task of machine interpretation for the first time [31]. Then, various types of attention mechanisms took place, such as Co-Attention networks [32], Self-Attention networks [33] and Recurrent Attention networks [34].…”
Section: Introductionmentioning
confidence: 99%
“…That is, the observation can be adjusted to the more informative features according to their relative importance, focusing the algorithm on the most relevant parts of the input, moving from focusing on global features to the focused features, thus saving resources and getting the most effective information quickly. The attention mechanism has arguably become one of the most important concepts in the field of deep learning, since Bahdanau, Cho & Bengio (2015) used attention mechanism for the machine interpretation tasks, various variants of attention mechanism have emerged, such as Co-Attention networks ( Yang et al, 2019a ; Han et al, 2021 ; Yu et al, 2019 ; Liu et al, 2021b ; Lu et al, 2016 ; Sharma & Srivastava, 2022 ), Recurrent Attention networks ( Osman & Samek, 2019 ; Ren & Zemel, 2017 ; Gan et al, 2019 ), Self-Attention networks ( Li et al, 2019 ; Fan et al, 2019 ; Ramachandran et al, 2019 ; Xia et al, 2022 ; Xiang et al, 2022 ; Yan, Silamu & Li, 2022 ), etc. The effectiveness of visual information processing is considerably enhanced by all of these attention mechanisms, which also optimize VQA performance.…”
Section: Introductionmentioning
confidence: 99%