2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.121
|View full text |Cite
|
Sign up to set email alerts
|

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Abstract: Understanding the visual relationship between two objects involves identifying the subject, the object, and a predicate relating them. We leverage the strong correlations between the predicate and the subj, obj pair (both semantically and spatially) to predict predicates conditioned on the subjects and the objects. Modeling the three entities jointly more accurately reflects their relationships compared to modeling them independently, but it complicates learning since the semantic space of visual relationships… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
286
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 290 publications
(287 citation statements)
references
References 24 publications
1
286
0
Order By: Relevance
“…Contextual information between objects is used also in [23], [35] with different learning methods. In [31] the background knowledge (from the training set and Wikipedia) is a probability distribution of a relationship given the subject/object. This knowledge drives the learning of visual relationships.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Contextual information between objects is used also in [23], [35] with different learning methods. In [31] the background knowledge (from the training set and Wikipedia) is a probability distribution of a relationship given the subject/object. This knowledge drives the learning of visual relationships.…”
Section: Related Workmentioning
confidence: 99%
“…Visual relationships are mainly detected with supervised learning techniques [31]. These require large training sets of images annotated with bounding boxes and relationships [14], [19].…”
Section: Introductionmentioning
confidence: 99%
“…Considering the fact that visual features provide limited knowledge for distinguishing the predicates, many works focus on introducing different modal features into predicate recognition stage. For example, [40,41,45,47] prove that language prior and location information denoting categories and location of object pairs are effective to improve the performances of visual relationship recognition. However, comparing to language prior, relative location information, as strong inferring to relationships, are not fully exploited in those works.…”
Section: Related Workmentioning
confidence: 99%
“…R@n computes the Recall using the top n object-pairs proposals' predictions in one image. Following [40], we also set a hyperparameter k, which means to take the top k predictions into consideration per object-pair. In visual relationship detection task, R@n,k=1 is equivalent to R@n in [22].…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation