2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01441
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Human-Object Interaction via Fabricated Compositional Learning

Abstract: Human-Object Interaction (HOI) detection, inferring the relationships between human and objects from images/videos, is a fundamental task for high-level scene understanding. However, HOI detection usually suffers from the open long-tailed nature of interactions with objects, while human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples. Inspired by this, we devise a novel HOI compositional learning framework, termed as Fabricated Compositional Learning (FCL), to addr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(33 citation statements)
references
References 52 publications
0
33
0
Order By: Relevance
“…Some works also exploit graph structure to enhance object dependencies [27,28,31,33,40]. Another bunch of two-stage methods is the compositional approaches [13][14][15]20], which disentangle HOI representations by learning from fabricated compositional HOIs. In contrast, our method disentangles representations by disentangled task encoders and decoders and its one-stage framework does not rely on pre-computed object proposals.…”
Section: Two-stage Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some works also exploit graph structure to enhance object dependencies [27,28,31,33,40]. Another bunch of two-stage methods is the compositional approaches [13][14][15]20], which disentangle HOI representations by learning from fabricated compositional HOIs. In contrast, our method disentangles representations by disentangled task encoders and decoders and its one-stage framework does not rely on pre-computed object proposals.…”
Section: Two-stage Methodsmentioning
confidence: 99%
“…To compose HOI triplets, it generates additional associative embeddings to match the interactions and instances. Since HOI detection is a composition problem [13,15], the latter decomposing strategy has several advantages compared with unified multi-tasking strategy. First, two sub-task decoders arXiv:2204.09290v1 [cs.CV] 20 Apr 2022 might attend to different regions via cross attention to facilitate learning and also results in better interpretability.…”
Section: Introductionmentioning
confidence: 99%
“…For UC, compared with FCL [15], GEN-VLKT s achieves 38.85% and 22.41% relative mAP gains on full categories for rare first and non-rare first selections, respectively. Specifically, benefiting from the VLKT mechanism, GEN-VLKT s still significantly promotes the performance for the unseen categories without the feature factorization and composition among images like VCL [13], FCL [15] and ATL [14]. The improvements mainly come from the utilization of CLIP, as indicated by comparing GEN-VLKT s to the baseline.…”
Section: Effectiveness For Zero-shot Hoi Detectionmentioning
confidence: 99%
“…VCL [13] composed novel HOI samples by combining decomposed object and verb features with pair-wise images and within images. FCL [15] presented an object fabricator to generate fake object representations for rare and unseen HOIs. ATL [14] explored object affordances from additional object images to discover novel HOI categories.…”
Section: Interaction Scoresmentioning
confidence: 99%
“…HOI, is of particular interest to understand the interactions of humans and other objects. A lot of HOI benchmarks, such as HICO [4], COCOa [39], vCOCO [10], and HOI-COCO [14], are built on top of the object categories provided in the COCO dataset [27].…”
Section: Related Workmentioning
confidence: 99%