2022
DOI: 10.48550/arxiv.2203.11654
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fine-Grained Scene Graph Generation with Data Transfer

Abstract: Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including longtail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…Since the predicates in scene graphs are highly relevant to the context, the direct enhancement methods based on class distribution are inapplicable for the balanced scene graph generation. Thus, some methods in SGG [48,52] explore visually relevant relationships from the external knowledge base to enhance the existing dataset. However, previous approaches require additional hyper-parameters or hand-designed enhancement rules limited to pre-defined scene graphs.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Since the predicates in scene graphs are highly relevant to the context, the direct enhancement methods based on class distribution are inapplicable for the balanced scene graph generation. Thus, some methods in SGG [48,52] explore visually relevant relationships from the external knowledge base to enhance the existing dataset. However, previous approaches require additional hyper-parameters or hand-designed enhancement rules limited to pre-defined scene graphs.…”
Section: Related Workmentioning
confidence: 99%
“…Although several weakly-supervised approaches could improve visual relation modeling through different knowledge bases [50,36,48,52], they require hand-designed rules and thus have limited generalization ability. Thus, we attempt to utilize the linguistic knowledge of pre-trained language models to boost fine-grained predicates in a lowresource way and make language models flexibly aware of scenes through visual prompts, as shown in the visuallyprompted language model module of CaCao in Figure 2 (a).…”
Section: Visually-prompted Language Modelmentioning
confidence: 99%
See 3 more Smart Citations