A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

Xiong, Fei; Zhang, Boshen; Xiao, Yang; Cao, Zhiguo; Yu, Taidong; Zhou, Joey Tianyi; Yuan, Junsong

doi:10.1109/iccv.2019.00088

Cited by 164 publications

(145 citation statements)

References 45 publications

Supporting

Mentioning

144

Contrasting

Order By: Relevance

“…We first compare our method with others on HANDS 2017 dataset . Since the HANDS 2017 dataset does not provide test set labels publicly, we evaluate using only mean joint error metric and compare our method with Vanora , THU VCLab (Chen et al 2018a), oasis (Moon, Chang, and Lee 2018a), RCN-3D (Yuan et al 2018), V2V-PoseNet (Moon, Chang, and Lee 2018b) and A2J (Xiong et al 2019). Results in Table 5 reflect that our ResNet18 based method already exceeds previous state-of-the-art methods by a large margin.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

“…Recent work (Xiong et al 2019) uses 2D offsets between anchor points and hand joints to represent 2D positions of joints. Due to the large variance of offsets, we further decompose them into 2D directional unit vector fields and closeness heatmaps, reflecting 2D directions and closeness from each pixel in depth images to target joints.…”

Section: Comprehensive Explorationsmentioning

confidence: 99%

“…Recently, deep neural network based methods have shown great performance in 3D hand pose estimation (Tompson et al 2014;Ge, Ren, and Yuan 2018;Wang et al 2018;Chen et al 2018a;Moon, Chang, and Lee 2018b;Wan et al 2018;Xiong et al 2019; Zhang and Zhang 2019; . These works can be divided into two categories according to networks' output type regressionbased methods and detection-based methods.…”

Section: Introductionmentioning

confidence: 99%

“…The idea of aggregating different regions in dense representations to derive joint coordinates has previously been seen in human (Sun et al 2018) or hand pose estimation (Xiong et al 2019; Zhang and Zhang 2019). However, (Sun et al 2018) for human pose estimation focus on RGB images and this type of representation ignores the 3D geometric properties of essentially 2.5D depth images.…”

Section: Introductionmentioning

confidence: 99%

“…However, (Sun et al 2018) for human pose estimation focus on RGB images and this type of representation ignores the 3D geometric properties of essentially 2.5D depth images. And (Xiong et al 2019; Zhang and Zhang 2019) for hand pose estimation are specialized for certain type of network structure, dense representation and input modality. The effectiveness and generality of this operation are not extensively validated.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Huang

Ren

Wang

et al. 2020

AAAI

View full text Add to dashboard Cite

In this paper, we propose an adaptive weighting regression (AWR) method to leverage the advantages of both detection-based and regression-based method. Hand joint coordinates are estimated as discrete integration of all pixels in dense representation, guided by adaptive weight maps. This learnable aggregation process introduces both dense and joint supervision that allows end-to-end training and brings adaptability to weight maps, making network more accurate and robust. Comprehensive exploration experiments are conducted to validate the effectiveness and generality of AWR under various experimental settings, especially its usefulness for different types of dense representation and input modality. Our method outperforms other state-of-the-art methods on four publicly available datasets, including NYU, ICVL, MSRA and HANDS 2017 dataset.

show abstract

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Section: Comprehensive Explorationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

Huang

Ren

Wang

et al. 2020

AAAI

View full text Add to dashboard Cite

show abstract

JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

Fang

Xing-yan

Liu

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions, including voxel-to-voxel predictions, point-to-point regression, and pixel-wise estimations. Despite the good performance, those methods have a few issues in nature, such as the poor trade-off between accuracy and efficiency, and plain feature representation learning with local convolutions. In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues. The key ideas are two-fold: a) explicitly modeling the dependencies among joints and the relations between the pixels and the joints for better local feature representation learning; b) unifying the dense pixel-wise offset predictions and direct joint regression for end-toend training. Specifically, we first propose a graph convolutional network (GCN) based joint graph reasoning module to model the complex dependencies among joints and augment the representation capability of each pixel. Then we densely estimate all pixels' offsets to joints in both image plane and depth space and calculate the joints' positions by a weighted average over all pixels' predictions, totally discarding the complex postprocessing operations. The proposed model is implemented with an efficient 2D fully convolutional network (FCN) backbone and has only about 1.4M parameters. Extensive experiments on multiple 3D hand pose estimation benchmarks demonstrate that the proposed method achieves new state-of-the-art accuracy while running very efficiently with around a speed of 110fps on a single NVIDIA 1080Ti GPU 4 . The code is available at https://github.com/fanglinpu/JGR-P2O.

show abstract

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Moon

et al. 2020

Lecture Notes in Computer Science

190

311

View full text Add to dashboard Cite

Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of 2.6M labeled single and interacting hand frames under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available 1 .

show abstract

A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image

Cited by 164 publications

References 45 publications

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation

JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Contact Info

Product

Resources

About