2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00377
|View full text |Cite
|
Sign up to set email alerts
|

Unbiased Scene Graph Generation From Biased Training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
492
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 570 publications
(563 citation statements)
references
References 44 publications
1
492
0
Order By: Relevance
“…We provide additional experimental results on Visual Genome (VG) dataset [52]. We follow [19], [21], [29] to adopt the most widely-used dataset split which consists of 108K images and includes the most frequent 150 object classes and 50 predicates. When evaluating visual relationship detection/scene graph generation on VG, there are three common evaluation modes including (1) Predicate Classification (PredCls): ground truth bounding boxes and object labels are given, (2) Scene Graph Classification (SGCls): only ground truth boxes given, and (3) Scene Graph Detection (SGDet): nothing other than input images is given.…”
Section: A Results On Visual Genomementioning
confidence: 99%
See 2 more Smart Citations
“…We provide additional experimental results on Visual Genome (VG) dataset [52]. We follow [19], [21], [29] to adopt the most widely-used dataset split which consists of 108K images and includes the most frequent 150 object classes and 50 predicates. When evaluating visual relationship detection/scene graph generation on VG, there are three common evaluation modes including (1) Predicate Classification (PredCls): ground truth bounding boxes and object labels are given, (2) Scene Graph Classification (SGCls): only ground truth boxes given, and (3) Scene Graph Detection (SGDet): nothing other than input images is given.…”
Section: A Results On Visual Genomementioning
confidence: 99%
“…Both SMN [21] and KERN [25] exploit this property and use the frequency bias and object cooccurrence, respectively. However, the usage of bias could reversely undermine the capability of generalization which has been demonstrated by comparing mean recall in recent works (e.g., [29]).…”
Section: A Results On Visual Genomementioning
confidence: 99%
See 1 more Smart Citation
“…Given an image-text pair (I, w), we first extract the visual scene graph G from the image with an off-the-shelf scene graph generator (Tang et al, 2020). A scene graph is a directed graph with the nodes representing the objects and the edges depicting their pairwise relationships.…”
Section: Cross-modal Alignment With Visual Scene Graph Encodingmentioning
confidence: 99%
“…In implementation, we first embed tokens in both the text sequence w and scene graph triplets (extracted by SGG (Tang et al, 2020)) with a pretrained BERT embedder (Devlin et al, 2019). We then extract the visual embedding of each image region and also the union region of each triplet with the Faster R-CNN component (Ren et al, 2015) used in the bottom-up-attention (Anderson et al, 2018).…”
Section: Cross-modal Alignment With Visual Scene Graph Encodingmentioning
confidence: 99%