2019
DOI: 10.1007/978-3-030-11018-5_6
|View full text |Cite
|
Sign up to set email alerts
|

3D Human Body Reconstruction from a Single Image via Volumetric Regression

Abstract: This paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to work from a variety of input types, whether it be landmarks, images or segmentation masks. Additionally, non-visible parts, either self-occluded or otherwise, are still reconstructed, which is not the case with depth map regression. We present results that show that our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
70
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 68 publications
(70 citation statements)
references
References 32 publications
0
70
0
Order By: Relevance
“…Implementation Details. The image encoders for both the low-resolution and high-resolution levels use a stacked hourglass network [31] with 4 and 1 stacks respectively, using the modification suggested by [16] and batch normalization replaced with group normalization [45]. Note that the fine image encoder removes one downsampling operation to achieve large feature embedding resolution.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Implementation Details. The image encoders for both the low-resolution and high-resolution levels use a stacked hourglass network [31] with 4 and 1 stacks respectively, using the modification suggested by [16] and batch normalization replaced with group normalization [45]. Note that the fine image encoder removes one downsampling operation to achieve large feature embedding resolution.…”
Section: Resultsmentioning
confidence: 99%
“…The MLP for the fine-level image encoder has the number of neurons of (272, 512, 256, 128, 1) with skip connections at second and third layers. Note 1 https://hdrihaven.com/ 16 , resulting in the input channel size of 272 in total. The coarse PIFu module is pre-trained with the input image resized to 512 × 512 and a batch size of 8.…”
Section: Resultsmentioning
confidence: 99%
“…Regarding single-view human model reconstruction, there are only two recent works by Varol et al [64] and Jackson et al [26]. In the former study, the 3D human datasets used for the training process are essentially synthesized human imagery textured over SMPL models (lacking geometry details), leading to SMPL-like voxel geometries in their outputs.…”
Section: Related Workmentioning
confidence: 99%
“…BodyNet [35] is an end-to-end network that infers volumetric body shape from a single image. By extending a face reconstruction network [37], Jackson et al [38] also propose a volume-based human shape reconstruction method. These two methods present 3D objects as voxel representation rather than mesh.…”
Section: B Non-parametric Approachmentioning
confidence: 99%