2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00127
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Reconstruct People in Clothing From a Single RGB Camera

Abstract: Figure 1: We present a deep learning based approach to estimate personalized body shape, including hair and clothing, using a single RGB camera. The shapes shown above have been calculated using only 8 input images, and re-posed using SMPL. AbstractWe present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
270
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 337 publications
(270 citation statements)
references
References 85 publications
0
270
0
Order By: Relevance
“…Single-photo body estimation methods typically bottleneck through fixed intermediate representations, which while enabling piecewise modeling, ultimately limit the amount of achievable detail. Some methods bottleneck through segmented images [21,15,38,10,33], others through estimated keypoints positions [7,26], and some through both [44,34,14,1]. All such methods permit too much ambiguity to allow for dense surface reconstruction.…”
Section: Related Workmentioning
confidence: 99%
“…Single-photo body estimation methods typically bottleneck through fixed intermediate representations, which while enabling piecewise modeling, ultimately limit the amount of achievable detail. Some methods bottleneck through segmented images [21,15,38,10,33], others through estimated keypoints positions [7,26], and some through both [44,34,14,1]. All such methods permit too much ambiguity to allow for dense surface reconstruction.…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, in Figure 5 we show comparison of our single-view method to the monocular video method of [6]. While our results look comparable and visually pleasing, should be noted that the method in [6] produces higher resolution reconstructions. The advantage of our method is that it requires just a single view.…”
Section: Datasetmentioning
confidence: 73%
“…MGN: Multi-Garment Net The input to the model is a set of semantically segmented images, I = {I 0 , I 1 , ..., I F − 1}, and corresponding 2D joint estimates, J = {J 0 , J 1 , ..., J F −1}, where F is the number of images used to make the prediction. Following [20,3], we abstract away the appearance information in RGB images and extract semantic garment segmentation [20] to reduce the risk of over-fitting, albeit at the cost of disregarding useful shading signal. For simplicity, let now θ denote both the joint angles θ and translation t.…”
Section: From Images To Garmentsmentioning
confidence: 99%
“…• 2D segmentation loss: Unlike [3] we do not optimize silhouette overlap, instead we jointly optimize the projected per-garment segmentation against the input segmentation mask. This ensures that each garment explains its corresponding mask in the image:…”
Section: Loss Functionsmentioning
confidence: 99%
See 1 more Smart Citation