HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

Malik, Jameel; Abdelaziz, Ibrahim; Elhayek, Ahmed; Shimada, Shigetaka; Ali, Sk Aziz; Golyanik, Vladislav; Theobalt, Christian; Stricker, Didier

doi:10.1109/cvpr42600.2020.00714

Cited by 70 publications

(64 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we focus on the related works that reconstruct both hand and object from monocular input. We refer the reader to [11,12] for a detailed overview of works focusing on the reconstruction of hands and objects in isolation.…”

Section: Related Workmentioning

confidence: 99%

“…Content may change prior to final publication. [11,[38][39][40][41] utilized only depth maps instead of RGB images for estimating hand poses. However, many of the existing approaches focus on predicting the hand pose and aim to be stable in the presence of objects, but do not address the problem of simultaneous estimation of both hand and object.…”

Section: B Hand-object Pose and Shape Estimationmentioning

confidence: 99%

“…The advantage of this representation is two-fold. First, depth map is inherently a 2.5D data which can be better represented in a 3D voxelized grid using binary quantization (occupancy grid) [11,38] or TSDF [51]. The TSDFbased representation is more effective than occupancy grid because TSDF allows to better encode the depth information by recognizing the voxels before and behind the observed surface [51].…”

Section: Figurementioning

confidence: 99%

See 2 more Smart Citations

Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input

et al. 2021

Self Cite

View full text Add to dashboard Cite

Estimating the hand-object meshes and poses is a challenging computer vision problem with many practical applications. In this paper, we introduce a simple yet efficient hand-object reconstruction algorithm. To this end, we exploit the fact that both the poses and the meshes are graphs-based representations of the hand-object with different levels of details. This allows taking advantage of the powerful Graph Convolution networks (GCNs) to build a coarse-to-fine Graph-based hand-object reconstruction algorithm. Thus, we start by estimating a coarse graph that represents the 2D hand-object poses. Then, more details (e.g. third dimension and mesh vertices) are gradually added to the graph until it represents the dense 3D hand-object meshes. This paper also explores the problem of representing the RGBD input in different modalities (e.g. voxelized RGBD). Hence, we adopted a multi-modal representation of the input by combining 3D representation (i.e. voxelized RGBD) and 2D representation (i.e. RGB only). We include intensive experimental evaluations that measure the ability of our simple algorithm to achieve state-of-theart accuracy on the most challenging datasets (i.e. HO-3D and FPHAB). INDEX TERMSHand pose estimation, hand shape estimation, hand-object interaction, graph convolution, machine learning.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Hand-object Pose and Shape Estimationmentioning

confidence: 99%

Section: Figurementioning

confidence: 99%

See 1 more Smart Citation

Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…When considering any application for human-computer interaction like UAV control with hand gestures then it required accurate 3D hand pose estimation with key points and gestures recognition at the joint level which has various degrees of freedom (DoF) [1].…”

Section: Introductionmentioning

confidence: 99%

“…The analysis of 3D hand gestures used the most current methods for estimating 3D locations from monocular RGB images by understanding hand key points but unable to describe the 3D shape of a hand. In recent years, the pose estimation tasks have massive advancement and this can be accredited to key developments in the field of deep learning and a decrease in the cost of depth sensors [1]. However, the specified problem may exist to face many challenging factors.…”

Section: Introductionmentioning

confidence: 99%

3D Hand Gestures Segmentation and Optimized Classification Using Deep Learning

et al. 2021

View full text Add to dashboard Cite

Hand gestures recognition system has massive applications which are mainly utilized in robotics and computer vision specially to control Unmanned Aerial Vehicles (UAV). These methods bypass the presence of electronic control to UAVs and provide an ease to the operators. In this paper, we present a method for 3D hand gestures segmentation and classification by combining MASK-RCNN with Grass Hopper Optimization. We created a private 3D and RGB hand gestures dataset using Intel Kinetic and Intel Real sense d435i camera, then proposed a model for RGB hand gestures to estimate the key points using human kinematics, the key points later then utilize to get the best degree of freedom (DoF). The grass hopper optimization besides minimum distance function was applied to achieve the finest deep features from the 3D hand gestures dataset. The ResNet50 network is used as the backbone to calculate the Overlap Coefficient (OC) for segmentation and the ResNet50, ResNet101 networks to calculate the classification for 3D hand gestures. The classification accuracy achieved on the private dataset is 99.05% and 99.29% on public Microsoft Kinect and Leap Motion dataset where the OC are 88.16%. and 88.19% respectively.

show abstract

HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization

Qian

Wang

Mueller

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

3D hand reconstruction from images is a widely-studied problem in computer vision and graphics, and has a particularly high relevance for virtual and augmented reality. Although several 3D hand reconstruction approaches leverage hand models as a strong prior to resolve ambiguities and achieve more robust results, most existing models account only for the hand shape and poses and do not model the texture. To fill this gap, in this work we present HTML, the first parametric texture model of human hands. Our model spans several dimensions of hand appearance variability (e.g., related to gender, ethnicity, or age) and only requires a commodity camera for data acquisition. Experimentally, we demonstrate that our appearance model can be used to tackle a range of challenging problems such as 3D hand reconstruction from a single monocular image. Furthermore, our appearance model can be used to define a neural rendering layer that enables training with a selfsupervised photometric loss. We make our model publicly available .

show abstract

HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

Cited by 70 publications

References 32 publications

Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input

Graph-Based Hand-Object Meshes and Poses Reconstruction With Multi-Modal Input

3D Hand Gestures Segmentation and Optimized Classification Using Deep Learning

HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization

Contact Info

Product

Resources

About