2021
DOI: 10.1609/aaai.v35i2.16217
|View full text |Cite
|
Sign up to set email alerts
|

DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection

Abstract: Recent years, human-object interaction (HOI) detection has achieved impressive advances. However, conventional two-stage methods are usually slow in inference. On the other hand, existing one-stage methods mainly focus on the union regions of interactions, which introduce unnecessary visual information as disturbances to HOI detection. To tackle the problems above, we propose a novel one-stage HOI detection approach DIRV in this paper, based on a new concept called interaction region for the HOI problem. Unlik… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 31 publications
(6 citation statements)
references
References 35 publications
0
6
0
Order By: Relevance
“…Though effective, they suffer from expensive computation due to the sequential inference architecture [17], and are highly dependent on prior detection results. In contrast, the one-stage methods [58][59][60][61] jointly detect human-object pairs and classify the interactions in an end-to-end manner by associating humans and objects with predefined anchors, which can be union boxes [58,61] or interaction points [59,60]. Despite featuring fast inference, they heavily rely on hand-crafted post-processing to associate interactions with object detection results [10].…”
Section: Introductionmentioning
confidence: 99%
“…Though effective, they suffer from expensive computation due to the sequential inference architecture [17], and are highly dependent on prior detection results. In contrast, the one-stage methods [58][59][60][61] jointly detect human-object pairs and classify the interactions in an end-to-end manner by associating humans and objects with predefined anchors, which can be union boxes [58,61] or interaction points [59,60]. Despite featuring fast inference, they heavily rely on hand-crafted post-processing to associate interactions with object detection results [10].…”
Section: Introductionmentioning
confidence: 99%
“…Parallel one-stage framework. [67][68][69][70] often apply a parallel structure to generate box candidates and predict interaction points. In the end, a matching module will aggregate the results from parallel branches and form the final HOI tuples.…”
Section: Detection Frameworkmentioning
confidence: 99%
“…(1) One-stage framework: inspired by existing methods [67][68][69][70][73][74][75][76][77], we aim to use one unified framework to accomplish box detection and interaction recognition. (2) Detection with external knowledge: we aim to leverage human pose knowledge [19,60,61,63,85] and natural language knowledge [21,25,81,82] to address the HOI detection problem.…”
Section: Chapter Summarymentioning
confidence: 99%
See 2 more Smart Citations