2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01115
|View full text |Cite
|
Sign up to set email alerts
|

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Abstract: Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures. However, the generalizability to different environments remains an elusive goal.In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision. A view synthesis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
57
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 111 publications
(58 citation statements)
references
References 40 publications
1
57
0
Order By: Relevance
“…3D poses are obtained by converting 2D poses to 3D one through epipolar geometry [10]. The 2D pose is converted into a 3D pose by a trained deep learning model [11]. A study evaluated whether the 3D poses of the user's motions obtained by the deep learning model are actually representative of the user's motions, and thus, confirmed that useful 3D poses are provided [19].…”
Section: Landmark Extraction Using Supervised Learningmentioning
confidence: 94%
See 1 more Smart Citation
“…3D poses are obtained by converting 2D poses to 3D one through epipolar geometry [10]. The 2D pose is converted into a 3D pose by a trained deep learning model [11]. A study evaluated whether the 3D poses of the user's motions obtained by the deep learning model are actually representative of the user's motions, and thus, confirmed that useful 3D poses are provided [19].…”
Section: Landmark Extraction Using Supervised Learningmentioning
confidence: 94%
“…The deep learning model analyzes RGB images and provides 3D poses of the user's motions [10,11]. A 2D pose is calculated by a pre-trained deep learning model using the RGB image.…”
Section: Landmark Extraction Using Supervised Learningmentioning
confidence: 99%
“…In addition to using temporal information, if the action was captured from multiple views, information from different views may be complementary. Chen et al [6] extracted features from two different views, and transformed the features from one view into another. If features from different views are transformed well, these features can be used to estimate better 3D human pose.…”
Section: B Deep-based 3d Human Pose Estimationmentioning
confidence: 99%
“…Pavllo et al [17] proposed a temporal convolution model to take 2D skeleton sequences as input to estimate 3D human poses. To utilize rich information from the spatial domain, Chen et al [6] proposed to jointly consider videos captured from two different views. 2D skeletons are first detected from each view.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation