2022
DOI: 10.1587/transinf.2021edp7189
|View full text |Cite
|
Sign up to set email alerts
|

Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 41 publications
0
6
0
Order By: Relevance
“…At present, AI is developing rapidly. As one of the applications of intelligent systems, VQA has attracted more Chinese researchers and scientists, focusing on this frontier field and promoting research progress in related fields ( Guo & Han, 2022 ; Guo & Han, 2023 ; Miao et al, 2022b ; Miao et al, 2022a ; Peng et al, 2022a ; Shen et al, 2022 ; Liu et al, 2022a ).…”
Section: Survey Methodologymentioning
confidence: 99%
“…At present, AI is developing rapidly. As one of the applications of intelligent systems, VQA has attracted more Chinese researchers and scientists, focusing on this frontier field and promoting research progress in related fields ( Guo & Han, 2022 ; Guo & Han, 2023 ; Miao et al, 2022b ; Miao et al, 2022a ; Peng et al, 2022a ; Shen et al, 2022 ; Liu et al, 2022a ).…”
Section: Survey Methodologymentioning
confidence: 99%
“…While these models achieve great results, they can significantly improve the model’s performance by pre-training the base model and transferring it to downstream tasks based on large-scale visual and question datasets. However, in the research of network models for VQA tasks and downstream tasks, utilizing the end-to-end approach [ 7 – 9 , 12 , 16 , 17 , 20 ] to train network models can better capture the modal information of images and texts and can effectively improve model performance.…”
Section: Related Workmentioning
confidence: 99%
“…The image representation can guide the attention of the question, and the performance of the question can guide the attention of the image. Nam et al [ 9 ] proposed a dual attention network to collect necessary information through multi-step processing of specific areas in the image and keywords in the question. However, because these attention models learn multimodal coarse interaction examples, it is difficult to infer the correlation images and questions between them.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations