2023
DOI: 10.1587/transinf.2022dlp0002
|View full text |Cite
|
Sign up to set email alerts
|

A Visual Question Answering Network Merging High- and Low-Level Semantic Information

Abstract: Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High-and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic info… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 36 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?