Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413600
|View full text |Cite
|
Sign up to set email alerts
|

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Abstract: We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of ⟨ℎ , , ⟩ in images. Most existing works treat HOIs as individual interaction categories, thus can not handle the problem of long-tail distribution and polysemy of action labels. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Leveraging the compositional and … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 74 publications
(37 citation statements)
references
References 38 publications
0
37
0
Order By: Relevance
“…On the one hand, most of the previous methods adopt a two-stage strategy [10], [8], [3], [16], [29], [26], [27], [20], [24]. During the first stage, they use an external object detector as [9] to point out interacting candidates, and then, during the second stage, another network is dedicated to estimate interactions between the proposals.…”
Section: B Hoi Detection Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…On the one hand, most of the previous methods adopt a two-stage strategy [10], [8], [3], [16], [29], [26], [27], [20], [24]. During the first stage, they use an external object detector as [9] to point out interacting candidates, and then, during the second stage, another network is dedicated to estimate interactions between the proposals.…”
Section: B Hoi Detection Methodsmentioning
confidence: 99%
“…More recently, improvement in the second stage have been proposed by using additional information in the image. For instance, [16], [27], [20] use human pose to have a finer analysis of the posture of the interaction subject. Other methods add word embedding [24], [21] or segmentation [29].…”
Section: B Hoi Detection Methodsmentioning
confidence: 99%
“…Human-Object Interaction: With the introduction of benchmark datasets like V-COCO [18] and HICO-DET [6], there is a plethora of works detecting human-object interactions [11,14,30,33,47,51,13,44,38,57,22,34,20,49,28]. Earlier works [17] in this area focus on the visual features of humans and objects.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, graph-based architectures where humans and objects are considered nodes attempt to understand spatial context [13,49] for the structural relations. ConsNet [34] has leveraged word embeddings of the objects in this graph structure. Moreover, IDN [28] considered HOI as a transformation process over humans and objects.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation