2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00408
|View full text |Cite
|
Sign up to set email alerts
|

Attentive Relational Networks for Mapping Images to Scene Graphs

Abstract: Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging topic. In this study, we propose a novel Attentive Relational Network that consists of two key modules with an obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
86
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 156 publications
(86 citation statements)
references
References 37 publications
0
86
0
Order By: Relevance
“…or tree-near-?. Such intuition has been empirically shown benefits in boosting SGG [62,7,28,30,29,71,20,73,58,13,44,59,45]. More specifically, these methods use a conditional random field [79] to model the joint distribution of nodes and edges, where the context is incorporated by message passing among the nodes through edges via a multi-step meanfield approximation [26]; then, the model is optimized by the sum of cross-entropy (XE) losses of nodes (e.g., objects) and edges (e.g., relationships).…”
Section: Introductionmentioning
confidence: 93%
“…or tree-near-?. Such intuition has been empirically shown benefits in boosting SGG [62,7,28,30,29,71,20,73,58,13,44,59,45]. More specifically, these methods use a conditional random field [79] to model the joint distribution of nodes and edges, where the context is incorporated by message passing among the nodes through edges via a multi-step meanfield approximation [26]; then, the model is optimized by the sum of cross-entropy (XE) losses of nodes (e.g., objects) and edges (e.g., relationships).…”
Section: Introductionmentioning
confidence: 93%
“…Ref. [29] proposed an attentive relational network, which use self-attention mechanism to sparse the connections in the graph. However, the built graph is static and cannot simulate the process of message propagation.…”
Section: Related Workmentioning
confidence: 99%
“…Implementation Details We implement our model based on TensorFlow [33] framework on the NVIDIA 2080 Ti GPU. Similar to prior works for scene graph generation [13,28,29], we adopt Faster R-CNN detector (with VGG16 pretrained in ImageNet dataset) [1] as backbone in feature extraction module. During training, the number of proposals from RPN is 256.…”
Section: Dataset and Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…Generating scene graphs from visual features [22,24,23,4,3,14,26,20,13,5] is a relatively explored task. Wan et al [19] specifically predicts new triplets for scene graph completion using existing scene graphs and visual features.…”
Section: Related Workmentioning
confidence: 99%