2020
DOI: 10.1609/aaai.v34i07.6761
|View full text |Cite
|
Sign up to set email alerts
|

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Abstract: In this paper, we propose an adaptive weighting regression (AWR) method to leverage the advantages of both detection-based and regression-based method. Hand joint coordinates are estimated as discrete integration of all pixels in dense representation, guided by adaptive weight maps. This learnable aggregation process introduces both dense and joint supervision that allows end-to-end training and brings adaptability to weight maps, making network more accurate and robust. Comprehensive exploration experiments a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 62 publications
(41 citation statements)
references
References 4 publications
0
40
0
1
Order By: Relevance
“…Anchor points are densely set on the input image to behave as local regressors for the joints and able to capture global-local spatial context information. AWR [11] adopts a learnable and adaptive weighting operation that is used to aggregate spatial information of different regions in dense representations with 2D convolutional CNNs. The weighting operation adds direct supervision on joint coordinates and draw consensus between the training and inference as well as enhancing the model's accuracy and generalisation ability by adaptively aggregating spatial information from related regions.…”
Section: Evaluated Methodsmentioning
confidence: 99%
“…Anchor points are densely set on the input image to behave as local regressors for the joints and able to capture global-local spatial context information. AWR [11] adopts a learnable and adaptive weighting operation that is used to aggregate spatial information of different regions in dense representations with 2D convolutional CNNs. The weighting operation adds direct supervision on joint coordinates and draw consensus between the training and inference as well as enhancing the model's accuracy and generalisation ability by adaptively aggregating spatial information from related regions.…”
Section: Evaluated Methodsmentioning
confidence: 99%
“…NTIS uses an efficient voxel-based representation as V2V-PoseNet [22] with a deeper architecture and weighted sub-voxel predictions on quarter of each voxel representations for robustness. AWR [13] adopts a learnable and adaptive weighting operation that is used to aggregate spatial information of different regions in dense representations with 2D convolutional CNNs.The weighting operation adds direct supervision on joint coordinates and draw consensus between training and inference as well as it enhances the models accuracy and generalisation ability by adaptively aggregating spatial information from related regions. Strawberryfg [39] employes a render-and-compare stage to enforce voxel-wise supervision for model training and adopts a 3D skeleton volume renderer to re-parameterize an initial pose estimate obtained similar to [35].…”
Section: Point Cloud 512 3d Pointsmentioning
confidence: 99%
“…Some approaches take advantage of both detection-based and regression-based methods. Similarly, AWR [13], Strawberryfg [39] estimates hand joint probability maps to estimate joint locations with a differentiable soft-argmax operation [35]. CrazyHand's hierarchical approach regresses the joint locations from joint probability maps.…”
Section: Point Cloud 512 3d Pointsmentioning
confidence: 99%
“…Although estimating hand pose [1][2][3][4][5][6][7] and object pose [8][9][10][11] in isolation has made remarkable success, jointly estimating both hand and manipulated object poses from a single RGB image remains a challenging task. Particularly, complicated HOI scenarios bring in various issues including not only complex pose variations and self-occlusions that commonly occur in hand-only or object-only pose estimation, but also severe mutual occlusion between hand and manipulated object [12].…”
Section: Introductionmentioning
confidence: 99%