Action Detection System Based on Pose Information

Kawai, Rie; Yoshida, Noboru; Liu, Jianquan

doi:10.1145/3551626.3564974

Cited by 1 publication

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its main goal is to locate a set of anatomical keypoints that correspond to the human body's joints and limbs in an image. HPE has been well studied (Guo 2020;Zhang et al 2021;Chang et al 2020) and forms the foundation for many downstream tasks such as action recognition (Kawai, Yoshida, and Liu 2022;Chao et al 2017;Xu et al 2022a;Duan et al 2022) and abnormal behavior detection (Tang et al 2021;Qiu et al 2022). Due to its potential applications in the real world, HPE remains an active area of research (Niemirepo, Viitanen, and Vanne 2020;Yu et al 2021;Zhang, Zhu, and Ye 2019;Li et al 2022Li et al , 2021cJiang et al 2023).…”

Section: Introductionmentioning

confidence: 99%

SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation

An,

Zhao,

Gong

et al. 2024

AAAI

View full text Add to dashboard Cite

High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: "Only sparse human keypoint locations are detected for human pose estimation, is it really necessary to describe the whole image in a dense, high-resolution manner?" Based on dynamic transformer models, we propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose). In detail, SHaRPose consists of two stages. At the coarse stage, the relations between image regions and keypoints are dynamically mined while a coarse estimation is generated. Then, a quality predictor is applied to decide whether the coarse estimation results should be refined. At the fine stage, SHaRPose builds sparse high-resolution representations only on the regions related to the keypoints and provides refined high-precision human pose estimations. Extensive experiments demonstrate the outstanding performance of the proposed method. Specifically, compared to the state-of-the-art method ViTPose, our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of 1.4x faster than ViTPose-Base. Code is available at https://github.com/AnxQ/sharpose.

show abstract