2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00855
|View full text |Cite
|
Sign up to set email alerts
|

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Abstract: Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address the VQA problem. In contrast to prior works, our method that targets single scene VQA, replies on graphbased techniques and involves reasoning. In a nutshell, our approach is centered on three graphs. The first graph, referred to as inference graph G I , is constructed via … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…IRG (Liu et al 2019) employs the graph relation for knowledge distillation. The graph knowledge is also used for visual query answering (Xiong et al 2019). However, different from these works, we aim at encouraging the discrimination of the learned deep embedding by regularizing the randomly constructed sub-graphs over data points to be consistent with each other, which is the obvious property of our defined 'Discriminative Graph' and discriminative feature distribution.…”
Section: Related Workmentioning
confidence: 99%
“…IRG (Liu et al 2019) employs the graph relation for knowledge distillation. The graph knowledge is also used for visual query answering (Xiong et al 2019). However, different from these works, we aim at encouraging the discrimination of the learned deep embedding by regularizing the randomly constructed sub-graphs over data points to be consistent with each other, which is the obvious property of our defined 'Discriminative Graph' and discriminative feature distribution.…”
Section: Related Workmentioning
confidence: 99%
“…In the face clustering problem, several works attempted to modeling pair-wise relationships for face graph generation [45], [49]. To reason between different instances in visual question answering tasks, Xiong et al [47] proposed a graph matching module for investigating such relation. In sketch-based action recognition, several works showed that modeling interactions of sketch joint can achieve excellent performance [25], [26], [48].…”
Section: Related Workmentioning
confidence: 99%
“…Reasoning about the relation between instance over time in the video is critical for activity recognition [53]. In addition, modeling relations between vision objects have become a popular problem in computer vision [18], [45], [47], [51]. The most straightforward visual reasoning task is object relation reasoning in CLEVR benchmark [13], and significant efforts have been devoted to a variety of traditional visual tasks with pairwise relationship reasoning.…”
Section: Related Workmentioning
confidence: 99%
“…Most scene graph methods consist on an object detector, an attribute classifier and a relationship predictor [48,19,54,50,55,56]. Scene graphs have been used in multiple vision and language tasks, including image captioning [51,7,54] and VQA [36,32,46]. However, less attention has been paid to generating scene graphs from videos, in which relationships are both spatial and temporal.…”
Section: Related Workmentioning
confidence: 99%