2020
DOI: 10.48550/arxiv.2010.10001
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Abstract: Human-object interaction (HOI) detection is an important task for understanding human activity. Graph structure is appropriate to denote the HOIs in the scene. Since there is an subordination between human and object-human play subjective role and object play objective role in HOI, the relations between homogeneous entities and heterogeneous entities in the scene should also not be equally the same. However, previous graph models regard human and object as the same kind of nodes and do not consider that the me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 37 publications
0
7
0
Order By: Relevance
“…Chao et al (2018) proposed multi-stream framework followed by subsequent works (Gao, Zou, and Huang 2018;Li et al 2019c;Gao et al 2020;Hou et al 2020). Qi et al (2018) and Wang, Zheng, and Yingbiao (2020) used graphical model to address HOI detection. Gkioxari et al (2018) estimated the interacted object locations.…”
Section: Related Workmentioning
confidence: 99%
“…Chao et al (2018) proposed multi-stream framework followed by subsequent works (Gao, Zou, and Huang 2018;Li et al 2019c;Gao et al 2020;Hou et al 2020). Qi et al (2018) and Wang, Zheng, and Yingbiao (2020) used graphical model to address HOI detection. Gkioxari et al (2018) estimated the interacted object locations.…”
Section: Related Workmentioning
confidence: 99%
“…DRG (Gao et al 2020) realized the problem that most methods ignore the contextual information from other interactions in the scene and predict the current interaction between each humanobject pair in isolation, so a dual graph is proposed to enable knowledge transfer among objects. CHG (Wang, Zheng, and Yingbiao 2020) found that relations between homogeneous entities and heterogeneous entities should not be equally the same, so a contextual heterogeneous graph is built to model the intra-class context and inter-class context, which is more elaborate than (Zhou and Chi 2019). Most recently, (Zou et al 2021;Tamura, Ohashi, and Yoshinaga 2021;Chen et al 2021;Zhang et al 2021) reasons about the interactions between humans and objects from global image context with In training, the original HOI triplet is first converted to a relational phrase, and then sent to a label composition module to generate new phrases, after that, the phrase is encoded to a phrase embedding as GT to supervise the prediction embedding.…”
Section: Related Workmentioning
confidence: 99%
“…A two-channel binary image representation is first advocated in iCAN [7] to encode the spatial relation, but in FCM-Net [22], a fine-grained version from human parsing is proposed to amplify the key cues. Apart from spatial relation, graph neural networks in DRG [6], CHG [34], RPNN [39] were proposed to explicitly model the interactions between human and objects, which sure improved the model's representation capability.…”
Section: Related Work 21 Two-stage Hoi Detectionmentioning
confidence: 99%
“…Pose Language AProle Two-stage methods VSRL [11] ResNet-50-FPN 31.8 InteractNet [9] ResNet-50-FPN 40.0 GPNN [27] ResNet-101 44.0 RPNN [39] ResNet50 47.5 VCL [14] ResNet101 48.3 TIN * [18] ResNet-50 48.7 Zhou et al [40] ResNet-50 48.9 PastaNet [17] ResNet-50 51.0 DRG [6] ResNet-50-FPN 51.0 VSGNet [31] ResNet-152 51.8 CHG [34] ResNet-50 52.7 PMFNet [33] ResNet-50-FPN 52.0 PD-Net [38] ResNet-152 52.6 FCMNet [22] ResNet-50 53.1 ACP * [15] ResNet-152 53.2 One-stage methods UnionDet [3] ResNet-50-FPN 47.5 IPNet [35] Hourglass-104 51.0 IPNet * [35] Hourglass-104 52.3 Ours ResNet-101 52.9 We conduct experiments to evaluate the relative importance of 'object' and 'interaction' in our experiments. In Eq.…”
Section: Backbonementioning
confidence: 99%