Hand3D: Hand Pose Estimation using 3D Neural Network

Deng, Xiaoming; Yang, Shuo; Zhang, Yinda; Tan, Ping; Chang, Liang; Wang, Hongan

doi:10.48550/arxiv.1704.02224

Cited by 14 publications

(28 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the NYU dataset annotates 36 joints, we use the same 14 joints for evaluation as most earlier works like [18] or [16]. 15.5 mm DeepModel [20] 16.9 mm DeepPrior [14] 19.8 mm DeepPrior++ [15] 12.3 mm Feedback [17] 16.2 mm Global to Local [13] 15.6 mm Hand3D [9] 17.6 mm HMDN [22] 16.3 mm Pose-REN [8] 11.8 mm REN [12] 12.7 mm SGN [22] 15.9 mm V2V-PoseNet [16] 8.4 mm the transformation parameters. Combining the appearance normalization pipeline with the Variable Hand CNN reduces the average joint location error by 3.3 mm.…”

Section: Results On the Nyu Datasetmentioning

confidence: 99%

“…Hand pose estimation approaches can be divided into three categories: 1) the generative, model-driven approaches that fit a hand model to the image observations by minimizing a cost function [4] [5] [6] [7], 2) the discriminative, datadriven approaches that directly predict the 3D joint locations from the images [8] [9] [10] [11] [12] [13] [14] [15] [16], and 3) the hybrid approaches that combine discriminative and generative elements [17] [18] [19] [20].…”

Section: Introductionmentioning

confidence: 99%

“…Discriminative methods play an important role because they are needed to initialize generative tracking methods and to recover in the case of tracking failure. State-of-theart discriminative methods use deep learning components such as 2D [8] [10] [12] [13] [14] [17] [15] [18] [21] [22] [20] or 3D [9] [11] [16] Convolutional Neural Networks (CNN) that might incorporate residual modules [8] [12] Fig. 1: Overview of our approach.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization

Wöhlke,

Li,

Lee

2018

Preprint

View full text Add to dashboard Cite

Since the emergence of large annotated datasets, state-of-the-art hand pose estimation methods have been mostly based on discriminative learning. Recently, a hybrid approach has embedded a kinematic layer into the deep learning structure in such a way that the pose estimates obey the physical constraints of human hand kinematics. However, the existing approach relies on a single person's hand shape parameters, which are fixed constants. Therefore, the existing hybrid method has problems to generalize to new, unseen hands. In this work, we extend the kinematic layer to make the hand shape parameters learnable. In this way, the learnt network can generalize towards arbitrary hand shapes. Furthermore, inspired by the idea of Spatial Transformer Networks, we apply a cascade of appearance normalization networks to decrease the variance in the input data. The input images are shifted, rotated, and globally scaled to a similar appearance. The effectiveness and limitations of our proposed approach are extensively evaluated on the Hands 2017 challenge dataset and the NYU dataset.

show abstract

Section: Results On the Nyu Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization

Wöhlke,

Li,

Lee

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…Random forest-based methods [21,23,39,[41][42][43]48] provide fast and accurate performance. However, they utilize hand-crafted features and are overcome by recent CNN-based approaches [1,3,4,6,7,10,11,14,15,24,29,30,37,45,50,51] that can learn useful features by themselves. Tompson et al [45] firstly utilized CNN to localize hand keypoints by estimating 2D heatmaps for each hand joint.…”

Section: Related Workmentioning

confidence: 99%

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Moon¹,

Chang²,

Lee³

2017

Preprint

View full text Add to dashboard Cite

Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly nonlinear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge. The code is available in 1 .

show abstract

“…Our binary-based approach is competitive with the state-of-the-art depth-based methods, and should be able to serve as a strong baseline on this task for future study. 3D hand pose estimation from depth maps: The performance of estimating 3D hand poses from depth maps has been improved rapidly (Choi et al 2015;Deng et al 2017;Ye, Yuan, and Kim 2016;Baek, In Kim, and Kim 2018;Wan et al 2018) in terms of prediction accuracy. The studies on depth-based hand pose estimation generally adopt either generative or discriminative methods.…”

Section: Introductionmentioning

confidence: 99%

Silhouette-Net: 3D Hand Pose Estimation from Silhouettes

Lee,

Liu,

Chen

et al. 2019

Preprint

View full text Add to dashboard Cite

3D hand pose estimation has received a lot of attention for its wide range of applications and has made great progress owing to the development of deep learning. Existing approaches mainly consider different input modalities and settings, such as monocular RGB, multi-view RGB, depth, or point cloud, to provide sufficient cues for resolving variations caused by self occlusion and viewpoint change. In contrast, this work aims to address the less-explored idea of using minimal information to estimate 3D hand poses. We present a new architecture that automatically learns a guidance from implicit depth perception and solves the ambiguity of hand pose through end-to-end training. The experimental results show that 3D hand poses can be accurately estimated from solely hand silhouettes without using depth maps. Extensive evaluations on the 2017 Hands In the Million Challenge (HIM2017) benchmark dataset further demonstrate that our method achieves comparable or even better performance than recent depthbased approaches and serves as the state-of-the-art of its own kind on estimating 3D hand poses from silhouettes.

show abstract

Hand3D: Hand Pose Estimation using 3D Neural Network

Cited by 14 publications

References 22 publications

Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization

Model-based Hand Pose Estimation for Generalized Hand Shape with Appearance Normalization

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Silhouette-Net: 3D Hand Pose Estimation from Silhouettes

Contact Info

Product

Resources

About