2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01111
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised 3D Hand Pose Estimation Through Training by Fitting

Abstract: We present a self-supervision method for 3D hand pose estimation from depth maps. We begin with a neural network initialized with synthesized data and fine-tune it on real but unlabelled depth maps by minimizing a set of datafitting terms. By approximating the hand surface with a set of spheres, we design a differentiable hand renderer to align estimates by comparing the rendered and input depth maps. In addition, we place a set of priors including a data-driven term to further regulate the estimate's kinemati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
58
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 106 publications
(58 citation statements)
references
References 50 publications
0
58
0
Order By: Relevance
“…Among the human limbs, the hand is the most easily used and is a medium capable of conveying various expressions; thus, gesture interaction mostly occurs by using hand gesture interaction, which requires the use of the hands. Hand gesture interaction necessitates accurate hand detection and hand joint information [17,18].…”
Section: Gesture Interactionmentioning
confidence: 99%
“…Among the human limbs, the hand is the most easily used and is a medium capable of conveying various expressions; thus, gesture interaction mostly occurs by using hand gesture interaction, which requires the use of the hands. Hand gesture interaction necessitates accurate hand detection and hand joint information [17,18].…”
Section: Gesture Interactionmentioning
confidence: 99%
“…Supported by experiments, it is shown, that the synthetic data enable the models to generalize better to real-world test data. In [56], the authors propose a self-supervision method for learning the 3D hand pose from an unlabeled depth map. The method is initialized by synthetic data in a supervised manner and fine-tuned on real depth maps in unsupervised manner.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Ge et al [81] used a synthetic dataset containing both ground truth 3D meshes and 3D poses to realize 3D hand shape and pose estimation. Wan et al [82] used depth maps, which were generated online from a hand model provided by [45] to train the deep neural network.…”
Section: Convolution Neural Networkmentioning
confidence: 99%