AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Huang, Weiting; Ren, Pengfei; Wang, Jingyu; Qi, Qi; Sun, Haifeng

doi:10.1609/aaai.v34i07.6761

Cited by 62 publications

(41 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Anchor points are densely set on the input image to behave as local regressors for the joints and able to capture global-local spatial context information. AWR [11] adopts a learnable and adaptive weighting operation that is used to aggregate spatial information of different regions in dense representations with 2D convolutional CNNs. The weighting operation adds direct supervision on joint coordinates and draw consensus between the training and inference as well as enhancing the model's accuracy and generalisation ability by adaptively aggregating spatial information from related regions.…”

Section: Evaluated Methodsmentioning

confidence: 99%

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction

Armagan

Garcia-Hernando

Baek

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and handobject interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

show abstract

Section: Evaluated Methodsmentioning

confidence: 99%

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction

Armagan

Garcia-Hernando

Baek

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…NTIS uses an efficient voxel-based representation as V2V-PoseNet [22] with a deeper architecture and weighted sub-voxel predictions on quarter of each voxel representations for robustness. AWR [13] adopts a learnable and adaptive weighting operation that is used to aggregate spatial information of different regions in dense representations with 2D convolutional CNNs.The weighting operation adds direct supervision on joint coordinates and draw consensus between training and inference as well as it enhances the models accuracy and generalisation ability by adaptively aggregating spatial information from related regions. Strawberryfg [39] employes a render-and-compare stage to enforce voxel-wise supervision for model training and adopts a 3D skeleton volume renderer to re-parameterize an initial pose estimate obtained similar to [35].…”

Section: Point Cloud 512 3d Pointsmentioning

confidence: 99%

“…Some approaches take advantage of both detection-based and regression-based methods. Similarly, AWR [13], Strawberryfg [39] estimates hand joint probability maps to estimate joint locations with a differentiable soft-argmax operation [35]. CrazyHand's hierarchical approach regresses the joint locations from joint probability maps.…”

Section: Point Cloud 512 3d Pointsmentioning

confidence: 99%

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Armagan¹,

Garcia-Hernando²,

Baek³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this work, we study how well different type of approaches generalise in the task of 3D hand pose estimation under hand-object interaction and single hand scenarios. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, our challenge is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of MANO model, and different HPE methods/backbones.

show abstract

“…Although estimating hand pose [1][2][3][4][5][6][7] and object pose [8][9][10][11] in isolation has made remarkable success, jointly estimating both hand and manipulated object poses from a single RGB image remains a challenging task. Particularly, complicated HOI scenarios bring in various issues including not only complex pose variations and self-occlusions that commonly occur in hand-only or object-only pose estimation, but also severe mutual occlusion between hand and manipulated object [12].…”

Section: Introductionmentioning

confidence: 99%

Coarse-to-Fine Hand–Object Pose Estimation with Interaction-Aware Graph Convolutional Network

Zhang¹,

Li²,

Liu³

et al. 2021

Sensors

View full text Add to dashboard Cite

The analysis of hand–object poses from RGB images is important for understanding and imitating human behavior and acts as a key factor in various applications. In this paper, we propose a novel coarse-to-fine two-stage framework for hand–object pose estimation, which explicitly models hand–object relations in 3D pose refinement rather than in the process of converting 2D poses to 3D poses. Specifically, in the coarse stage, 2D heatmaps of hand and object keypoints are obtained from RGB image and subsequently fed into pose regressor to derive coarse 3D poses. As for the fine stage, an interaction-aware graph convolutional network called InterGCN is introduced to perform pose refinement by fully leveraging the hand–object relations in 3D context. One major challenge in 3D pose refinement lies in the fact that relations between hand and object change dynamically according to different HOI scenarios. In response to this issue, we leverage both general and interaction-specific relation graphs to significantly enhance the capacity of the network to cover variations of HOI scenarios for successful 3D pose refinement. Extensive experiments demonstrate state-of-the-art performance of our approach on benchmark hand–object datasets.

show abstract

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Cited by 62 publications

References 4 publications

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Coarse-to-Fine Hand–Object Pose Estimation with Interaction-Aware Graph Convolutional Network

Contact Info

Product

Resources

About