2017
DOI: 10.48550/arxiv.1711.07399
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

Abstract: Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can dist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(25 citation statements)
references
References 49 publications
0
24
0
1
Order By: Relevance
“…We use all three views (frontal + side views) of the images in the NYU training dataset for training. Although the NYU dataset annotates 36 joints, we use the same 14 joints for evaluation as most earlier works like [18] or [16]. 15.5 mm DeepModel [20] 16.9 mm DeepPrior [14] 19.8 mm DeepPrior++ [15] 12.3 mm Feedback [17] 16.2 mm Global to Local [13] 15.6 mm Hand3D [9] 17.6 mm HMDN [22] 16.3 mm Pose-REN [8] 11.8 mm REN [12] 12.7 mm SGN [22] 15.9 mm V2V-PoseNet [16] 8.4 mm the transformation parameters.…”
Section: Results On the Nyu Datasetmentioning
confidence: 99%
See 4 more Smart Citations
“…We use all three views (frontal + side views) of the images in the NYU training dataset for training. Although the NYU dataset annotates 36 joints, we use the same 14 joints for evaluation as most earlier works like [18] or [16]. 15.5 mm DeepModel [20] 16.9 mm DeepPrior [14] 19.8 mm DeepPrior++ [15] 12.3 mm Feedback [17] 16.2 mm Global to Local [13] 15.6 mm Hand3D [9] 17.6 mm HMDN [22] 16.3 mm Pose-REN [8] 11.8 mm REN [12] 12.7 mm SGN [22] 15.9 mm V2V-PoseNet [16] 8.4 mm the transformation parameters.…”
Section: Results On the Nyu Datasetmentioning
confidence: 99%
“…2 Our approach performs well, only being two to three millimeters worse in accuracy than the leading approaches. Compared to the best performing approach of mks0601 [16], our approach is conceptually and especially computationally much simpler. While the approach presented in [16] carries out inference with 3.5 frames per second on a single GPU, our approach achieves 838 frames per second and therefore has real-time capability.…”
Section: Residual Network For Higher Accuracymentioning
confidence: 99%
See 3 more Smart Citations