2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00207
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Unseen Visual Relations Using Analogies

Abstract: We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
149
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 122 publications
(150 citation statements)
references
References 31 publications
(60 reference statements)
1
149
0
Order By: Relevance
“…model [33] have shown that using both unigram and trigram representations of HOIs may solve the above contradiction. Nonetheless, all these methods ignore the implicit relations among HOI categories, thus we extend the hybrid model by aggregating common sense knowledge for generating semantic embeddings.…”
Section: Related Workmentioning
confidence: 99%
“…model [33] have shown that using both unigram and trigram representations of HOIs may solve the above contradiction. Nonetheless, all these methods ignore the implicit relations among HOI categories, thus we extend the hybrid model by aggregating common sense knowledge for generating semantic embeddings.…”
Section: Related Workmentioning
confidence: 99%
“…Video Visual Relation Detection. Compare to ImgVRD [5,13,15,[33][34][35], VidVRD has not received sufficient attention until the recent due to its complexity and a lack of suitable dataset. [19] contributed ImageNet-VidVRD dataset which labels all relation triplets in video as well as the trajectories of corresponding subject and object and becomes the first dataset on video visual relation detection.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike visual relation detection in image (ImgVRD) that has been widely studied for years [5,13,15,[33][34][35], its counterpart in video domain has just attracted researchers' attention [16,19,23]. Video visual relation detection (VidVRD) requires to track the objects and their pairwise relations in a video.…”
Section: Introductionmentioning
confidence: 99%
“…A single predicate could introduce up to 20 2 new relationship categories, for which samples must be collected and models should be trained. Moreover, we know that the distribution of naturally-occurring triplets is longtailed, with combinations such as person ride dog rarely appearing [29]. This exposes standard training methods to issues arising from extreme class imbalance.…”
Section: Introductionmentioning
confidence: 99%