Mesoscopic Facial Geometry Inference Using Deep Neural Networks

Huynh, Loc; Chen, Weikai; Saito, Shunsuke; Xing, Junliang; Nagano, Koki; Jones, Andrew; Debevec, Paul; Li, Hao

doi:10.1109/cvpr.2018.00877

Cited by 69 publications

(40 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the availability of large-scale 3D shape dataset [5], learning-based approaches [43,12,15] are able to consider single or few images thanks to the shape prior learned from the data. To simplify the learning problem, recent works reconstruct 3D shape via predicting intermediate 2.5D representations, such as depth map [25], image collections [18], displacement map [16] or normal map [36,44]. Pose estimation is another key task to understanding the visual environment.…”

Section: Related Workmentioning

confidence: 99%

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

Liu

Chen

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

623

539

View full text Add to dashboard Cite

Rendering bridges the gap between 2D vision and 3D scenes by simulating the physical process of image formation. By inverting such renderer, one can think of a learning approach to infer 3D information from 2D images. However, standard graphics renderers involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence able to be learned. Unlike the state-of-the-art differentiable renderers [29,19], which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images.The key to our framework is a novel formulation that views rendering as an aggregation function that fuses the probabilistic contributions of all mesh triangles with respect to the rendered pixels. Such formulation enables our framework to flow gradients to the occluded and far-range vertices, which cannot be achieved by the previous state-of-thearts. We show that by using the proposed renderer, one can achieve significant improvement in 3D unsupervised singleview reconstruction both qualitatively and quantitatively. Experiments also demonstrate that our approach is able to handle the challenging tasks in image-based shape fitting, which remain nontrivial to existing differentiable renderers. Code is available at https://github.com/ ShichenLiu/SoftRas.

show abstract

Section: Related Workmentioning

confidence: 99%

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

Liu

Chen

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

623

539

View full text Add to dashboard Cite

show abstract

“…Network architectures. Both our silhouette synthesis network and the front-to-back synthesis network follow the U-Net network architecture in [22,55,21,49,47] with an input channel size of 7 and 4, respectively. All the weights in these networks are initialized based on Gaussian distribution.…”

Section: Implementation Detailsmentioning

confidence: 99%

SiCloPe: Silhouette-Based Clothed People

Natsume

Saito

Huang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

203

153

View full text Add to dashboard Cite

We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. Our method can reconstruct a complete and textured 3D model of a person wearing clothes from a single input picture. Inspired by the visual hull algorithm, our implicit representation uses 2D silhouettes and 3D joints of a body pose to describe the immense shape complexity and variations of clothed people. Given a segmented 2D silhouette of a person and its inferred 3D joints from the input picture, we first synthesize consistent silhouettes from novel view points around the subject. The synthesized silhouettes which are the most consistent with the input segmentation are fed into a deep visual hull algorithm for robust 3D shape prediction. We then infer the texture of the subject's back view using the frontal image and segmentation mask as input to a conditional generative adversarial network. Our experiments demonstrate that our silhouette-based model is an effective representation and the appearance of the back view can be predicted reliably using an image-to-image translation network. While classic methods based on parametric models often fail for single-view images of subjects with challenging clothing, our approach can still produce successful results, which are comparable to those obtained from multi-view input.

show abstract

“…The digital embodiment of the VR HMD user output from our system is restricted to the frontal face region and still has significant room for improvement. A more compelling full head embodiment could be constructed by modelling hair [71], texture [46] and shape details [34], [45].…”

Section: Results and Analysismentioning

confidence: 99%

“…The state-of-the-art [43] that integrates a convolutional encoder network with an expert-designed generative model does not require any 3D facial data for training, while is still able to output promising reconstructions. Recovery of facial geometry and texture details using deep neural networks is also an interesting direction [45], [46].…”

Section: B 3d Face Reconstruction From a Single Imagementioning

confidence: 99%

Realistic Facial Expression Reconstruction for VR HMD Users

Lou

Wang

Nduka³

et al. 2020

IEEE Trans. Multimedia

View full text Add to dashboard Cite

We present a system for sensing and reconstructing facial expressions of the virtual reality (VR) head-mounted display (HMD) user. The HMD occludes a large portion of the user's face, which makes most existing facial performance capturing techniques intractable. To tackle this problem, a novel hardware solution with electromyography (EMG) sensors being attached to the headset frame is applied to track facial muscle movements. For realistic facial expression recovery, we first reconstruct the user's 3D face from a single image and generate the personalized blendshapes associated with seven facial action units (AUs) on the most emotionally salient facial parts (ESFPs). We then utilize preprocessed EMG signals for measuring activations of AU-coded facial expressions to drive pre-built personalized blendshapes. Since facial expressions appear as important nonverbal cues of the subject's internal emotional states, we further investigate the relationship between six basic emotions -anger, disgust, fear, happiness, sadness and surprise, and detected AUs using a fern classifier. Experiments show the proposed system can accurately sense and reconstruct high-fidelity common facial expressions while providing useful information regarding the emotional state of the HMD user.

show abstract

Mesoscopic Facial Geometry Inference Using Deep Neural Networks

Cited by 69 publications

References 35 publications

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

SiCloPe: Silhouette-Based Clothed People

Realistic Facial Expression Reconstruction for VR HMD Users

Contact Info

Product

Resources

About