2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01011
|View full text |Cite
|
Sign up to set email alerts
|

Disentangling Latent Hands for Image Synthesis and Pose Estimation

Abstract: Hand image synthesis and pose estimation from RGB images are both highly challenging tasks due to the large discrepancy between factors of variation ranging from image background content to camera viewpoint. To better analyze these factors of variation, we propose the use of disentangled representations and a disentangled variational autoencoder (dVAE) that allows for specific sampling and inference of these factors. The derived objective from the variational lower bound as well as the proposed training strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
99
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 125 publications
(101 citation statements)
references
References 32 publications
2
99
0
Order By: Relevance
“…They usually require a good initialization; otherwise they are susceptible to getting stuck in local minima. Discriminative methods learn a direct mapping from visual observations to hand poses [23,27,10,13,31,2]. Thanks to large-scale annotated datasets [31,29,23], deep learningbased discriminative methods have shown very strong performance in the hand pose estimation task.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…They usually require a good initialization; otherwise they are susceptible to getting stuck in local minima. Discriminative methods learn a direct mapping from visual observations to hand poses [23,27,10,13,31,2]. Thanks to large-scale annotated datasets [31,29,23], deep learningbased discriminative methods have shown very strong performance in the hand pose estimation task.…”
Section: Related Workmentioning
confidence: 99%
“…To make the 3D pose annotations consistent for RHD, we follow [31,2] and modify the palm joint in STB to the wrist point. Similar to [31,2,19,27], we use 10 sequences for training and the other 2 for testing.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%
See 2 more Smart Citations
“…[2] improves the robustness of pose estimation methods by synthesizing more images from the augmented skeletons, which is achieved by obtaining more unseen skeletons instead of leveraging the unseen combinations of the specified factor (pose) and unspecified factors (background) in the existing dataset like ours. The most related work is [57], which proposes an disentangled VAE to learn the specified (pose) and additional (appearance) factors. However, our method explicitly makes the appearance factor orthogonal to the pose during training process, while [2] only guarantees that the pose factor does not contain information about the image contents.…”
Section: Related Workmentioning
confidence: 99%