Detecting human—object interaction with multi-level pairwise feature network

Liu, Hanchao; Mu, Tai Jiang; Huang, Xiaolei

doi:10.1007/s41095-020-0188-2

Cited by 19 publications

(9 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the spatial distances must be normalized. Following the method in [59], we normalized the spatial distances between each keypoint using Equations (14).…”

Section: ) Spatial Distance Between Human Body Parts and Interacting ...mentioning

confidence: 99%

“…However, as research has progressed, more emphasis on interaction details has become important. To address this, various methods have been introduced, such as attention mechanisms [9], [10], [11], [12], context information [13], [14], graph convolutional neural networks [15], [16], [17], [18], [19], body parts, and poses [20], [21], [22], [23], [24] to enhance the focus on local details within images. Specifically, the construction of context appearance features has become crucial, in addition to the visual and spatial features of humans and objects.…”

Section: Introductionmentioning

confidence: 99%

“…To address these issues, we believe that integrating human body posture and body part spatial information, as well as local facial details, into embedding a graph provides more interpretable information. While previous graph-based methods have attempted to encode human body posture and body part spatial information into node embeddings [18], [14], [43], [44], [45], insufficient attention has been paid to body parts and posture features for interacting objects. In addition, facial part information was not encoded into the nodes of the graph.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Parallel Multi-Head Graph Attention Network (PMGAT) Model for Human-Object Interaction Detection

Zhang,

Yunos,

Haron

2023

IEEE Access

View full text Add to dashboard Cite

Human-object interaction (HOI) detection is an advanced task in the field of computer vision and is crucial for deep scene understanding. However, current HOI detection models face serious challenges in the following aspects: first, they overly rely on appearance features and neglect the local details of humanobject interactions; second, the training cost of the existing detection model is quite high. To overcome these challenges, this study proposes a Parallel Multi-Head Graph Attention Network (PMGAT) model for detecting human-object interaction correlations. First, the close relationship between facial landmarks and body keypoints with objects is recognized, thereby introducing a local feature module to construct a relational graph model between facial keypoints, body keypoints, and objects. A multi-head graph attention network was utilized to accurately capture the interaction correlations between keypoints, addressing the issue of neglecting local details. Furthermore, the global feature module is designed to extract absolute spatial pose features and relative spatial pose features based on the positions of human keypoints relative to objects, enabling a more in-depth extraction of interactions between humans and objects. To reduce the training cost of the model, it adopts a multi-branch parallel structure and employs a multi-threaded multi-GPU scheme for parallel training acceleration. The empirical results demonstrate that the PMGAT model outperforms the current state-of-the-art ViPLO method in terms of mAP on the V-COCO and HICO-DET datasets. On V-COCO, it exhibits a notable improvement of up to 0.8% mAP over ViPLO, while on the more demanding HICO-DET, the improvement reaches up to 1.47% mAP. Furthermore, PMGAT stands out for its minimal training time compared to existing approaches. Overall, these results corroborate the dual augmentation of PMGAT in accuracy and training efficiency.

show abstract

“…Therefore, the spatial distances must be normalized. Following the method in [59], we normalized the spatial distances between each keypoint using Equations (14).…”

Section: ) Spatial Distance Between Human Body Parts and Interacting ...mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Parallel Multi-Head Graph Attention Network (PMGAT) Model for Human-Object Interaction Detection

Zhang,

Yunos,

Haron

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Many HOI recognition systems have been proposed in recent years comprising of both deep learning [18,19,20] and machine learning based approaches [21]. However, in our proposed work, we have developed a machine learning based multi-vision sensors system that incorporates a semantic segmentation technique.…”

Section: Related Workmentioning

confidence: 99%

Semantic Recognition of Human-Object Interactions via Gaussian-Based Elliptical Modeling and Pixel-Level Labeling

et al. 2021

View full text Add to dashboard Cite

Human-Object Interaction (HOI) recognition, due to its significance in many computer visionbased applications, requires in-depth and meaningful details from image sequences. Incorporating semantics in scene understanding has led to a deep understanding of human-centric actions. Therefore, in this research work, we propose a semantic HOI recognition system based on multi-vision sensors. In the proposed system, the de-noised RGB and depth images, via Bilateral Filtering (BLF), are segmented into multiple clusters using a Simple Linear Iterative Clustering (SLIC) algorithm. The skeleton is then extracted from segmented RGB and depth images via Euclidean Distance Transform (EDT). Human joints, extracted from the skeleton, provide the annotations for accurate pixel-level labeling. An elliptical human model is then generated via a Gaussian Mixture Model (GMM). A Conditional Random Field (CRF) model is trained to allocate a specific label to each pixel of different human body parts and an interaction object. Two semantic feature types that are extracted from each labeled body part of the human and labelled objects are: Fiducial points and 3D point cloud. Features descriptors are quantized using Fisher's Linear Discriminant Analysis (FLDA) and classified using K-ary Tree Hashing (KATH). In experimentation phase the recognition accuracy achieved with the Sports dataset is 92.88%, with the Sun Yat-Sen University (SYSU) 3D HOI dataset is 93.5% and with the Nanyang Technological University (NTU) RGB+D dataset it is 94.16%. The proposed system is validated via extensive experimentation and should be applicable to many computer-vision based applications such as healthcare monitoring, security systems and assisted living etc.INDEX TERMS 3D point cloud, fiducial points, human-object interaction, pixel labeling, semantic segmentation, super-pixels, K-ary tree hashing.

show abstract

“…The method based on local instance mainly analyzes the intrinsic relation between human and object from the local features such as bones, parts, and postures of the object subject. In order to extract more fine-grained information, Liu et al [4] constructed a body part-based dataset HAKE and proposed a multi-level pairwise feature network (PFNet). Zhong et al [5] proposed the glance and gaze network (GGNet), which adaptively models a set of action perception points through two steps of glance and gaze.…”

Section: Introductionmentioning

confidence: 99%

Human-object interaction detection based on graph model

Ye¹,

Xiu-ju²

2023

Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022)

View full text Add to dashboard Cite

Human-Object Interaction (HOI) detection is a fundamental task for understanding real-world scenes. In this paper, a graph model-based human-object interaction detection algorithm is proposed, which aims to make full use of the visual-spatial features and semantic information of human-object instances in the image, thereby improving the accuracy of interaction detection. Aiming at the characteristics of visual-spatial features and semantic information, we take the visual features of human and object instance boxes as nodes, and the corresponding spatial features of interaction relations as edges to construct an initial dense graph, and adaptively update the graph through the spatial and semantic information of instances. The V-COCO dataset is used to evaluate the algorithm, and the final accuracy is significantly improved, which proves the effectiveness of the algorithm.

show abstract

Detecting human—object interaction with multi-level pairwise feature network

Cited by 19 publications

References 36 publications

Parallel Multi-Head Graph Attention Network (PMGAT) Model for Human-Object Interaction Detection

Parallel Multi-Head Graph Attention Network (PMGAT) Model for Human-Object Interaction Detection

Semantic Recognition of Human-Object Interactions via Gaussian-Based Elliptical Modeling and Pixel-Level Labeling

Human-object interaction detection based on graph model

Contact Info

Product

Resources

About