2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00540
|View full text |Cite
|
Sign up to set email alerts
|

Dense 3D Regression for Hand Pose Estimation

Abstract: We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D directional vector fields. The 2D/3… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
144
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 159 publications
(155 citation statements)
references
References 55 publications
0
144
0
Order By: Relevance
“…With the abundance of affordable commodity depth cameras, the research literature focused naturally more on estimating 3D hand pose through depth observations (e.g. [62,66,10,36,61]), and many works also explored this problem in multi-view setups [33,65,41,8,31,50]. When it comes to a monocular color input, the problem becomes inherently ill posed due to the increased depth and scale ambiguities, but that did not prevent several researchers [4,9,51,57,63,39] from attempting to solve it in the past albeit with limited results.…”
Section: Introductionmentioning
confidence: 99%
“…With the abundance of affordable commodity depth cameras, the research literature focused naturally more on estimating 3D hand pose through depth observations (e.g. [62,66,10,36,61]), and many works also explored this problem in multi-view setups [33,65,41,8,31,50]. When it comes to a monocular color input, the problem becomes inherently ill posed due to the increased depth and scale ambiguities, but that did not prevent several researchers [4,9,51,57,63,39] from attempting to solve it in the past albeit with limited results.…”
Section: Introductionmentioning
confidence: 99%
“…We can achieve about 220.7 fps speed on a single GPU which meets the requirement of real-time applications. Although V2V [5] and [30] achieved most accurate results, they only can run at 3.5 fps and 27.8 fps, respectively.…”
Section: E Runtime Analysismentioning
confidence: 99%
“…3, we see that the results of our method are in the range of recent state-of-the-art approaches even using only a small fraction of the labeled real samples. Also note that several of the most recent methods focus on improved input and/or output representations [4,6,20,40], which are orthogonal to our work.…”
Section: Comparison On Full Datasetmentioning
confidence: 99%
“…DISCO Nets [2] (NIPS 2016) 20.7 Crossing Nets [39] (CVPR 2017) 15.5 LSPS [1] (BMVC 2018) 15.4 Weak supervision [22] (CVIU 2017) 14.8 Lie-X [45] (IJCV 2017) 14.5 3DCNN [7] (CVPR 2017) 14.1 REN-9x6x6 [41] (JVCI 2018) 12.7 DeepPrior++ [23] (ICCVw 2017) 12.3 Pose Guided REN [3] (Neurocomputing 2018) 11.8 SHPR-Net [4] (IEEE Access 2018) 10.8 Hand PointNet [6] (CVPR 2018) 10.5 Dense 3D regression [40] (CVPR 2018) 10.2 V2V single model [20] (CVPR 2018) 9.2 V2V ensemble [20] (CVPR 2018) 8.4 Feature mapping [29] The comparisons in this section are based upon the numbers published by the authors. That is, these comparisons disregard differences in the used data subsamples, models, architectures, and other specificities.…”
Section: Me (Mm)mentioning
confidence: 99%