2023
DOI: 10.1049/cvi2.12186
|View full text |Cite
|
Sign up to set email alerts
|

DSGEM: Dual scene graph enhancement module‐based visual question answering

Abstract: Visual Question Answering (VQA) aims to appropriately answer a text question by understanding the image content. Attention‐based VQA models mine the implicit relationships between objects according to the feature similarity, which neglects the explicit relationships between objects, for example, the relative position. Most Visual Scene Graph‐based VQA models exploit the relative positions or visual relationships between objects to construct the visual scene graph, while they suffer from the semantic insufficie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 59 publications
(123 reference statements)
0
1
0
Order By: Relevance
“…The results following the best settings in the original paper are shown in Table 4. Here is an introduction to these methods: DSGEM [56] utilizes commonsense knowledge and syntactic structures to construct visual and textual scene graphs, explicitly assigning specific semantics to each edge relation. Two scene graph enhancement modules are proposed to propagate external and structural knowledge, providing clear guidance for feature interactions between objects.…”
Section: Influence Of Different Stacking Depthsmentioning
confidence: 99%
“…The results following the best settings in the original paper are shown in Table 4. Here is an introduction to these methods: DSGEM [56] utilizes commonsense knowledge and syntactic structures to construct visual and textual scene graphs, explicitly assigning specific semantics to each edge relation. Two scene graph enhancement modules are proposed to propagate external and structural knowledge, providing clear guidance for feature interactions between objects.…”
Section: Influence Of Different Stacking Depthsmentioning
confidence: 99%