ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Liu, Ye; Yuan, Junsong; Chen, Chang Wen

doi:10.1145/3394171.3413600

Cited by 74 publications

(37 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the one hand, most of the previous methods adopt a two-stage strategy [10], [8], [3], [16], [29], [26], [27], [20], [24]. During the first stage, they use an external object detector as [9] to point out interacting candidates, and then, during the second stage, another network is dedicated to estimate interactions between the proposals.…”

Section: B Hoi Detection Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Orcesi,

Audigier,

Toukam

et al. 2022

Preprint

View full text Add to dashboard Cite

Detecting human interactions is crucial for human behavior analysis. Many methods have been proposed to deal with Human-to-Object Interaction (HOI) detection, i.e., detecting in an image which person and object interact together and classifying the type of interaction. However, Human-to-Human Interactions, such as social and violent interactions, are generally not considered in available HOI training datasets. As we think these types of interactions cannot be ignored and decorrelated from HOI when analyzing human behavior, we propose a new interaction dataset to deal with both types of human interactions: Human-to-Human-or-Object (H 2 O). In addition, we introduce a novel taxonomy of verbs, intended to be closer to a description of human body attitude in relation to the surrounding targets of interaction, and more independent of the environment. Unlike some existing datasets, we strive to avoid defining synonymous verbs when their use highly depends on the target type or requires a high level of semantic interpretation. As H 2 O dataset includes V-COCO images annotated with this new taxonomy, images obviously contain more interactions. This can be an issue for HOI detection methods whose complexity depends on the number of people, targets or interactions. Thus, we propose DIABOLO (Detecting In-terActions By Only Looking Once), an efficient subject-centric single-shot method to detect all interactions in one forward pass, with constant inference time independent of image content. In addition, this multi-task network simultaneously detects all people and objects. We show how sharing a network for these tasks does not only save computation resource but also improves performance collaboratively. Finally, DIABOLO is a strong baseline for the new proposed challenge of H 2 O-Interaction detection, as it outperforms all state-of-the-art methods when trained and evaluated on HOI dataset V-COCO. We hope that this new dataset and new baseline will foster future research. H 2 O is available on https://kalisteo.cea.fr/.

show abstract

Section: B Hoi Detection Methodsmentioning

confidence: 99%

“…More recently, improvement in the second stage have been proposed by using additional information in the image. For instance, [16], [27], [20] use human pose to have a finer analysis of the posture of the interaction subject. Other methods add word embedding [24], [21] or segmentation [29].…”

Section: B Hoi Detection Methodsmentioning

confidence: 99%

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Orcesi,

Audigier,

Toukam

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Human-Object Interaction: With the introduction of benchmark datasets like V-COCO [18] and HICO-DET [6], there is a plethora of works detecting human-object interactions [11,14,30,33,47,51,13,44,38,57,22,34,20,49,28]. Earlier works [17] in this area focus on the visual features of humans and objects.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, graph-based architectures where humans and objects are considered nodes attempt to understand spatial context [13,49] for the structural relations. ConsNet [34] has leveraged word embeddings of the objects in this graph structure. Moreover, IDN [28] considered HOI as a transformation process over humans and objects.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

GTNet:Guided Transformer Network for Detecting Human-Object Interactions

Iftekhar¹,

Kumar²,

McEver³

et al. 2021

Preprint

View full text Add to dashboard Cite

The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. HOI is considered one of the fundamental steps in truly understanding complex visual scenes. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images that highlight the interactions between human object pairs. This issue is addressed by the proposed self-attention based guided transformer network, GTNet. GTNet encodes this spatial contextual information in human and object visual features via self-attention while achieving a 4%-6% improvement over previous state of the art results on both the V-COCO [18] and HICO-DET [6] datasets. Code will be made available online. 1

show abstract

Towards Hard-Positive Query Mining for DETR-Based Human-Object Interaction Detection

Zhong

Ding

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Cited by 74 publications

References 38 publications

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

Detecting Human-to-Human-or-Object (H2O) Interactions with DIABOLO

GTNet:Guided Transformer Network for Detecting Human-Object Interactions

Towards Hard-Positive Query Mining for DETR-Based Human-Object Interaction Detection

Contact Info

Product

Resources

About