2018
DOI: 10.1016/j.cviu.2017.12.004
|View full text |Cite
|
Sign up to set email alerts
|

Image Understanding using vision and reasoning through Scene Description Graph

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 69 publications
(35 citation statements)
references
References 16 publications
0
35
0
Order By: Relevance
“…Among these works, [55] uses scene graphs for explainable and explicit reasoning with structured knowledge. Aditya et al [56] use directed and labeled scene description graph for reasoning in image captioning, retrieval, and visual question answering applications. In another recent work, [57] introduces a method for globally reasoning over regional relations in a single image.…”
Section: Related Workmentioning
confidence: 99%
“…Among these works, [55] uses scene graphs for explainable and explicit reasoning with structured knowledge. Aditya et al [56] use directed and labeled scene description graph for reasoning in image captioning, retrieval, and visual question answering applications. In another recent work, [57] introduces a method for globally reasoning over regional relations in a single image.…”
Section: Related Workmentioning
confidence: 99%
“…A stronger correlation with human judgements indicates that a metric captures the features that humans look for, while assessing a candidate caption. In order to measure the sentence-level correlation of our proposed metric with human judgements we use the COMPOS-ITE dataset [2] which contains human judgements for 11,985 candidate captions and their image counterparts. The images in this dataset are obtained from MS COCO, Flickr8k and Flickr30k datasets, whereas, the associated captions consist of human generated captions (sourced from the aforementioned datasets) and machine generated captions (using two captioning models [2], [19]).…”
Section: Correlation With Human Judgementsmentioning
confidence: 99%
“…Then, the region features appropriate for each layer are determined through two fully connected layers. and region features are extracted as follows: In the case of the relationship detection layer, the region features are determined by inputting the two bounding boxes ( , ) of the two objects comprising the relationship region, as well as the visual features, to the fully connected layer, as shown in (11). This aids identification of the spatial relationship (e.g., a relationship indicated by an "in" or "on" preposition) through the locational relationships between the two objects.…”
Section: Candidate Region Proposal As Shown Inmentioning
confidence: 99%
“…Further, the scene graph approach has lower difficulty as a learning problem, because there is no need to consider complex grammatical structures. Moreover, the knowledge graphs acquired from images can be easily combined with numerous existing background datasets and prior knowledge datasets and can potentially exert power in more application areas [11][12][13]. Image captions and scene graphs have common characteristics in that they are generated with consideration of the objects in the images and the relationships between those objects.…”
Section: Introductionmentioning
confidence: 99%