2020
DOI: 10.1007/s41095-020-0188-2
|View full text |Cite
|
Sign up to set email alerts
|

Detecting human—object interaction with multi-level pairwise feature network

Abstract: Human–object interaction (HOI) detection is crucial for human-centric image understanding which aims to infer ⟨human, action, object⟩ triplets within an image. Recent studies often exploit visual features and the spatial configuration of a human–object pair in order to learn the action linking the human and object in the pair. We argue that such a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at whic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 36 publications
0
9
0
Order By: Relevance
“…Therefore, the spatial distances must be normalized. Following the method in [59], we normalized the spatial distances between each keypoint using Equations (14).…”
Section: ) Spatial Distance Between Human Body Parts and Interacting ...mentioning
confidence: 99%
See 2 more Smart Citations
“…Therefore, the spatial distances must be normalized. Following the method in [59], we normalized the spatial distances between each keypoint using Equations (14).…”
Section: ) Spatial Distance Between Human Body Parts and Interacting ...mentioning
confidence: 99%
“…However, as research has progressed, more emphasis on interaction details has become important. To address this, various methods have been introduced, such as attention mechanisms [9], [10], [11], [12], context information [13], [14], graph convolutional neural networks [15], [16], [17], [18], [19], body parts, and poses [20], [21], [22], [23], [24] to enhance the focus on local details within images. Specifically, the construction of context appearance features has become crucial, in addition to the visual and spatial features of humans and objects.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Many HOI recognition systems have been proposed in recent years comprising of both deep learning [18,19,20] and machine learning based approaches [21]. However, in our proposed work, we have developed a machine learning based multi-vision sensors system that incorporates a semantic segmentation technique.…”
Section: Related Workmentioning
confidence: 99%
“…The method based on local instance mainly analyzes the intrinsic relation between human and object from the local features such as bones, parts, and postures of the object subject. In order to extract more fine-grained information, Liu et al [4] constructed a body part-based dataset HAKE and proposed a multi-level pairwise feature network (PFNet). Zhong et al [5] proposed the glance and gaze network (GGNet), which adaptively models a set of action perception points through two steps of glance and gaze.…”
Section: Introductionmentioning
confidence: 99%