2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) 2019
DOI: 10.1109/iccvw.2019.00348
|View full text |Cite
|
Sign up to set email alerts
|

3D Hand Pose Estimation from RGB Using Privileged Learning with Depth Data

Abstract: This paper proposes a method for hand pose estimation from RGB images that uses both external large-scale depth image datasets and paired depth and RGB images as privileged information at training time. We show that providing depth information during training significantly improves performance of pose estimation from RGB images during testing. We explore different ways of using this privileged information: (1) using depth data to initially train a depth-based network, (2) using the features from the depthbased… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(16 citation statements)
references
References 53 publications
(151 reference statements)
0
16
0
Order By: Relevance
“…Some HPE methods [26], [27] boost the performance of RGB-based HPE with the help of privileged learning of depth information. In [26], a depth regularizer network is applied after the 3D HPE network during training to learn to generate the corresponding depth map from a 3D hand pose.…”
Section: B Existing Hpe Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some HPE methods [26], [27] boost the performance of RGB-based HPE with the help of privileged learning of depth information. In [26], a depth regularizer network is applied after the 3D HPE network during training to learn to generate the corresponding depth map from a 3D hand pose.…”
Section: B Existing Hpe Methodsmentioning
confidence: 99%
“…Similarly, in [41] the network learns to generate the corresponding depth map from the 3D hand shape instead of the pose. In [27], an RGB-based HPE and a depth-based HPE network are independently trained. The depth-based network is then frozen and the RGB-based network's training is resumed with paired RGB and depth images by sharing the information between the middle CNN layers of these two networks.…”
Section: B Existing Hpe Methodsmentioning
confidence: 99%
“…Yuan et al [128] were among the first to utilize depth data during training, by employing a two-staged training strategy to estimate poses from each modality with two CNNs. Initially, they regressed 3D joint locations from the depth-based network, while, on the second stage, they froze its parameters and used the RGB-based network in order to train paired images.…”
Section: Unimodal Inferencementioning
confidence: 99%
“…Srinivas and Fleuret [41] improved it by applying Jacobian matching to networks. Recently, crossmodal knowledge distillation [14,50,54] extended knowledge distillation by applying it to transferring knowledge across different modalities. Our approach generalizes crossmodal knowledge distillation to target datasets where superior modalities are missing.…”
Section: Related Workmentioning
confidence: 99%
“…Leveraging multi-modal knowledge to boost the performance of classic computer vision problems, such as classification [28,35,50], object detection [14,39,51] and gesture recognition [1,7,40,44,54,59,60], has emerged as a promising research field in recent years. Current paradigms for transferring knowledge across modalities involve aligning feature representations from multiple modalities of data during training, and then improving the performance of a unimodal system during testing with the aligned feature representations.…”
Section: Introductionmentioning
confidence: 99%