Automatic acquisition of high-fidelity facial performances using monocular videos

Shi, Fuhao; Wu, Hsiang-Tao; Tong, Xin; Chai, Jinxiang

doi:10.1145/2661229.2661290

Cited by 126 publications

(106 citation statements)

References 44 publications

Supporting

Mentioning

104

Contrasting

Order By: Relevance

“…Recently, 3D face priors, dense flow, and shape from shading methods have been introduced to achieve higher fidelity 3D tracking from unconstrained monocular videos [Garrido et al 2013;Shi et al 2014]. Using dimension-reduced linear models and sufficient prior facial data, a single camera can be used to generate compelling facial animation in real-time without any calibration [Cao et al 2014].…”

Section: Previous Workmentioning

confidence: 99%

Facial performance sensing head-mounted display

et al. 2015

View full text Add to dashboard Cite

To enable immersive face-to-face communication in virtual worlds, the facial expressions of a user have to be captured while wearing a virtual reality head-mounted display. Because the face is largely occluded by typical wearable displays, we have designed an HMD that combines ultra-thin strain sensors with a head-mounted RGB-D camera for real-time facial performance capture and animation. AbstractThere are currently no solutions for enabling direct face-to-face interaction between virtual reality (VR) users wearing head-mounted displays (HMDs). The main challenge is that the headset obstructs a significant portion of a user's face, preventing effective facial capture with traditional techniques. To advance virtual reality as a nextgeneration communication platform, we develop a novel HMD that enables 3D facial performance-driven animation in real-time. Our wearable system uses ultra-thin flexible electronic materials that are mounted on the foam liner of the headset to measure surface strain signals corresponding to upper face expressions. These strain signals are combined with a head-mounted RGB-D camera to enhance the tracking in the mouth region and to account for inaccurate HMD placement. To map the input signals to a 3D face model, we perform a single-instance offline training session for each person. For reusable and accurate online operation, we propose a short calibration step to readjust the Gaussian mixture distribution of the mapping before each use. The resulting animations are visually on par with cutting-edge depth sensor-driven facial performance capture systems and hence, are suitable for social interactions in virtual worlds.

show abstract

Section: Previous Workmentioning

confidence: 99%

Facial performance sensing head-mounted display

et al. 2015

View full text Add to dashboard Cite

show abstract

“…In terms of 3D facial geometry reconstruction for the refinement of landmarks, recently there has been an increasing amount of research based on 2D images and videos [19,[35][36][37][38][39][40][41]. In order to accurately track facial landmarks, it is important to first reconstruct face geometry.…”

Section: Literature Reviewmentioning

confidence: 99%

“…For example, methods such as those in Refs. [19,37,40] can reconstruct details such as wrinkles, and track subtle facial movements, but are affected by shadows and occlusions. Robust methods such as Refs.…”

Section: Literature Reviewmentioning

confidence: 99%

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

et al. 2017

View full text Add to dashboard Cite

We present a novel approach for automatically detecting and tracking facial landmarks across poses and expressions from in-the-wild monocular video data, e.g., YouTube videos and smartphone recordings. Our method does not require any calibration or manual adjustment for new individual input videos or actors. Firstly, we propose a method of robust 2D facial landmark detection across poses, by combining shape-face canonical-correlation analysis with a global supervised descent method. Since 2D regression-based methods are sensitive to unstable initialization, and the temporal and spatial coherence of videos is ignored, we utilize a coarse-todense 3D facial expression reconstruction method to refine the 2D landmarks. On one side, we employ an in-the-wild method to extract the coarse reconstruction result and its corresponding texture using the detected sparse facial landmarks, followed by robust pose, expression, and identity estimation. On the other side, to obtain dense reconstruction results, we give a face tracking flow method that corrects coarse reconstruction results and tracks weakly textured areas; this is used to iteratively update the coarse face model. Finally, a dense reconstruction result is estimated after it converges. Extensive experiments on a variety of video sequences recorded by ourselves or downloaded from YouTube show the results of facial landmark detection and tracking under various lighting conditions, for various head poses and facial expressions. The overall performance and a comparison Manuscript received: 2016-09-04; accepted: 2016-12-20 with state-of-art methods demonstrate the robustness and effectiveness of our method.

show abstract

“…While the methods above are able to infer detailed geometry we aim for the creation of an avatar of the recorded user, that can be animated programmatically or using other sources of tracking parameters. The systems of [Garrido et al 2013], and [Shi et al 2014] essentially recover detailed facial geometry by providing one mesh per frame deformed to match the input data. The former uses a pre-built user-specific blendshape model for the face alignment by employing automatically corrected feature points [Saragih et al 2011].…”

Section: Dynamic Modelingmentioning

confidence: 99%

“…Although our tracking approach and detail enhancement is based on similar principles, the aim of our approach is to integrate all these shape corrections directly into our proposed two-scale representation of dynamic 3D faces. Shi et al [2014] use their own feature detector along with a non-rigid structure-from-motion algorithm to track and model the identity and per-frame expressions of the face by employing a bilinear face model. Additionally, a keyframe-based iterative approach using shape from shading is employed in order to further refine the bilinear model parameters, as well as the albedo texture of the face, and per-frame normal maps exhibiting high frequency details such as wrinkles Neither method aims at creating an animation-ready avatar that incorporates all of the extracted details.…”

Section: Dynamic Modelingmentioning

confidence: 99%

Dynamic 3D avatar creation from hand-held video input

2015

View full text Add to dashboard Cite

EPFLFigure 1: Our system creates a fully rigged 3D avatar of the user from uncalibrated video input acquired with a cell-phone camera. The blendshape models of the reconstructed avatars are augmented with textures and dynamic detail maps, and can be animated in realtime. AbstractWe present a complete pipeline for creating fully rigged, personalized 3D facial avatars from hand-held video. Our system faithfully recovers facial expression dynamics of the user by adapting a blendshape template to an image sequence of recorded expressions using an optimization that integrates feature tracking, optical flow, and shape from shading. Fine-scale details such as wrinkles are captured separately in normal maps and ambient occlusion maps. From this user-and expression-specific data, we learn a regressor for on-the-fly detail synthesis during animation to enhance the perceptual realism of the avatars. Our system demonstrates that the use of appropriate reconstruction priors yields compelling face rigs even with a minimalistic acquisition system and limited user assistance. This facilitates a range of new applications in computer animation and consumer-level online communication based on personalized avatars. We present realtime application demos to validate our method.

show abstract

Automatic acquisition of high-fidelity facial performances using monocular videos

Cited by 126 publications

References 44 publications

Facial performance sensing head-mounted display

Facial performance sensing head-mounted display

Robust facial landmark detection and tracking across poses and expressions for in-the-wild monocular video

Dynamic 3D avatar creation from hand-held video input

Contact Info

Product

Resources

About