Dense 3D Regression for Hand Pose Estimation

Wan, Chengde; Probst, Thomas; Gool, Luc Van; Yao, Angela

doi:10.1109/cvpr.2018.00540

Cited by 159 publications

(155 citation statements)

References 55 publications

Supporting

Mentioning

144

Contrasting

Order By: Relevance

“…With the abundance of affordable commodity depth cameras, the research literature focused naturally more on estimating 3D hand pose through depth observations (e.g. [62,66,10,36,61]), and many works also explored this problem in multi-view setups [33,65,41,8,31,50]. When it comes to a monocular color input, the problem becomes inherently ill posed due to the increased depth and scale ambiguities, but that did not prevent several researchers [4,9,51,57,63,39] from attempting to solve it in the past albeit with limited results.…”

Section: Introductionmentioning

confidence: 99%

3D Hand Shape and Pose From Images in the Wild

Boukhayma

Torr

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

348

380

View full text Add to dashboard Cite

We present in this work the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild. Our network consists of the concatenation of a deep convolutional encoder, and a fixed model-based decoder. Given an input image, and optionally 2D joint detections obtained from an independent CNN, the encoder predicts a set of hand and view parameters. The decoder has two components: A pre-computed articulated mesh deformation hand model that generates a 3D mesh from the hand parameters, and a re-projection module controlled by the view parameters that projects the generated hand into the image domain. We show that using the shape and pose prior knowledge encoded in the hand model within a deep learning framework yields stateof-the-art performance in 3D pose prediction from images on standard benchmarks, and produces geometrically valid and plausible 3D reconstructions. Additionally, we show that training with weak supervision in the form of 2D joint annotations on datasets of images in the wild, in conjunction with full supervision in the form of 3D joint annotations on limited available datasets allows for good generalization to 3D shape and pose predictions on images in the wild.

show abstract

Section: Introductionmentioning

confidence: 99%

3D Hand Shape and Pose From Images in the Wild

Boukhayma

Torr

2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

348

380

View full text Add to dashboard Cite

show abstract

“…We can achieve about 220.7 fps speed on a single GPU which meets the requirement of real-time applications. Although V2V [5] and [30] achieved most accurate results, they only can run at 3.5 fps and 27.8 fps, respectively.…”

Section: E Runtime Analysismentioning

confidence: 99%

HMTNet: 3D Hand Pose Estimation From Single Depth Image Based on Hand Morphological Topology

et al. 2020

View full text Add to dashboard Cite

Thanks to the rapid development of CNNs and depth sensors, great progress has been made in 3D hand pose estimation. Nevertheless, it is still far from being solved for its cluttered circumstance and severe self-occlusion of hand. In this paper, we propose a method that takes advantage of human hand morphological topology (HMT) structure to improve the pose estimation performance. The main contributions of our work can be listed as below. Firstly, in order to extract more powerful features, we concatenate original and last layer of initial feature extraction module to preserve hand information better. Next, regression module inspired from hand morphological topology is proposed. In this submodule, we design a tree-like network structure according to hand joints distribution to make use of high order dependency of hand joints. Lastly, we conducted sufficient ablation experiments to verify our proposed method on each dataset. Experimental results on three popular hand pose dataset show superior performance of our method compared with the state-of-the-art methods. On ICVL and NYU dataset, our method outperforms great improvement over 2D state-of-the-art methods. On MSRA dataset, our method achieves comparable accuracy with the state-of-the-art methods. To summarize, our method is the most efficient method which can run at 220.7 fps on a single GPU compared with approximate accurate methods at present. The code will be available at a . a https://github.com/weiguochow/HMTNet INDEX TERMS 3D hand pose estimation, concatenated feature, hand morphological topology, single depth image.

show abstract

“…3, we see that the results of our method are in the range of recent state-of-the-art approaches even using only a small fraction of the labeled real samples. Also note that several of the most recent methods focus on improved input and/or output representations [4,6,20,40], which are orthogonal to our work.…”

Section: Comparison On Full Datasetmentioning

confidence: 99%

“…DISCO Nets [2] (NIPS 2016) 20.7 Crossing Nets [39] (CVPR 2017) 15.5 LSPS [1] (BMVC 2018) 15.4 Weak supervision [22] (CVIU 2017) 14.8 Lie-X [45] (IJCV 2017) 14.5 3DCNN [7] (CVPR 2017) 14.1 REN-9x6x6 [41] (JVCI 2018) 12.7 DeepPrior++ [23] (ICCVw 2017) 12.3 Pose Guided REN [3] (Neurocomputing 2018) 11.8 SHPR-Net [4] (IEEE Access 2018) 10.8 Hand PointNet [6] (CVPR 2018) 10.5 Dense 3D regression [40] (CVPR 2018) 10.2 V2V single model [20] (CVPR 2018) 9.2 V2V ensemble [20] (CVPR 2018) 8.4 Feature mapping [29] The comparisons in this section are based upon the numbers published by the authors. That is, these comparisons disregard differences in the used data subsamples, models, architectures, and other specificities.…”

Section: Me (Mm)mentioning

confidence: 99%

MURAUER: Mapping Unlabeled Real Data for Label AUstERity

Poier

Opitz

Schinagl

et al. 2019

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Data labeling for learning 3D hand pose estimation models is a huge effort. Readily available, accurately labeled synthetic data has the potential to reduce the effort. However, to successfully exploit synthetic data, current state-of-the-art methods still require a large amount of labeled real data. In this work, we remove this requirement by learning to map from the features of real data to the features of synthetic data mainly using a large amount of synthetic and unlabeled real data. We exploit unlabeled data using two auxiliary objectives, which enforce that (i) the mapped representation is pose specific and (ii) at the same time, the distributions of real and synthetic data are aligned. While pose specifity is enforced by a self-supervisory signal requiring that the representation is predictive for the appearance from different views, distributions are aligned by an adversarial term. In this way, we can significantly improve the results of the baseline system, which does not use unlabeled data and outperform many recent approaches already with about 1% of the labeled real data. This presents a step towards faster deployment of learning based hand pose estimation, making it accessible for a larger range of applications.© 2019 IEEE Project webpage providing code and additional material can be found at https://poier.github.io/murauer

show abstract

Dense 3D Regression for Hand Pose Estimation

Cited by 159 publications

References 55 publications

3D Hand Shape and Pose From Images in the Wild

3D Hand Shape and Pose From Images in the Wild

HMTNet: 3D Hand Pose Estimation From Single Depth Image Based on Hand Morphological Topology

MURAUER: Mapping Unlabeled Real Data for Label AUstERity

Contact Info

Product

Resources

About