Human Shape from Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks

Dibra, Endri; Jain, Himanshu; Öztireli, Cengiz; Ziegler, Remo; Groß, Markus

doi:10.1109/cvpr.2017.584

Cited by 89 publications

(90 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The SiC-loPe [31] system tolerates greater clothing variation than our system, but its geometric detail is limited by the use of intermediate silhouettes. To the credit of these works, all but [10] were designed for capturing bodies "in the wild" with tolerance for pose variation, whereas our goal is to capture a detailed avatar from a restricted pose.…”

Section: Related Workmentioning

confidence: 99%

FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second

Smith

Loper

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

a b c d e Figure 1: FAX converts a single RGB image (a) into a scan (b, d) with albedo texture (c, e) AbstractCurrent methods for body shape estimation either lack detail or require many images. They are usually architecturally complex and computationally expensive. We propose FACSIMILE (FAX), a method that estimates a detailed body from a single photo, lowering the bar for creating virtual representations of humans. Our approach is easy to implement and fast to execute, making it easily deployable. FAX uses an image-translation network which recovers geometry at the original resolution of the image. Counterintuitively, the main loss which drives FAX is on per-pixel surface normals instead of per-pixel depth, making it possible to estimate detailed body geometry without any depth supervision. We evaluate our approach both qualitatively and quantitatively, and compare with a state-of-the-art method.

show abstract

Section: Related Workmentioning

confidence: 99%

FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second

Smith

Loper

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

show abstract

“…Dibra et al [106] used an encoder followed by three fully connected layers which regress the SCAPE parameters from one or multiple silhouette images. Later, Dibra et al [107] first learn a common embedding of 2D silhouettes and 3D human body shapes (see Section 7.3.1). The latter are represented using their Heat Kernel Signatures [108].…”

Section: D Human Body Reconstructionmentioning

confidence: 99%

Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era

Han

Laga

Bennamoun

2021

IEEE Trans. Pattern Anal. Mach. Intell.

321

150

View full text Add to dashboard Cite

3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. Since 2015, image-based 3D reconstruction using convolutional neural networks (CNN) has attracted increasing interest and demonstrated an impressive performance. Given this new era of rapid evolution, this article provides a comprehensive survey of the recent developments in this field. We focus on the works which use deep learning techniques to estimate the 3D shape of generic objects either from a single or multiple RGB images. We organize the literature based on the shape representations, the network architectures, and the training mechanisms they use. While this survey is intended for methods which reconstruct generic objects, we also review some of the recent works which focus on specific object classes such as human body shapes and faces. We provide an analysis and comparison of the performance of some key papers, summarize some of the open problems in this field, and discuss promising directions for future research.

show abstract

“…The most practical but also challenging setting is capturing from a single monocular RGB camera. Some methods attempt to infer the shape parameters of a body model from a single image [41,53,10,24,8,32,86,39,55], but reconstructed detail is constrained to the model shape space, and thus does not capture personalized shape detail and clothing geometry. Recent work [6,5] estimates more detailed shape, including clothing, from a video sequence of a person rotating in front of a camera while holding a rough A-pose.…”

Section: Introductionmentioning

confidence: 99%

Learning to Reconstruct People in Clothing From a Single RGB Camera

Alldieck

Magnor

Bhatnagar

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

337

266

View full text Add to dashboard Cite

Figure 1: We present a deep learning based approach to estimate personalized body shape, including hair and clothing, using a single RGB camera. The shapes shown above have been calculated using only 8 input images, and re-posed using SMPL. AbstractWe present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous methods. From semantic segmentation images, our Octopus model reconstructs a 3D shape, including the parameters of SMPL plus clothing and hair in 10 seconds or less. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into poseinvariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, Octopus can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 5mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach. Code is available at [2]. * Work partly conducted during an internship at the Real Virtual Humans group of Max Planck for Informatics.

show abstract

Human Shape from Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks

Cited by 89 publications

References 57 publications

FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second

FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second

Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era

Learning to Reconstruct People in Clothing From a Single RGB Camera

Contact Info

Product

Resources

About