Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition

Huang, Haoda; Chai, Jinxiang; Tong, Xin; Wu, Hsiang-Tao

doi:10.1145/1964921.1964969

Cited by 48 publications

(42 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The advantage of hand animation is that the artist can precisely style and time the animation, but it is extremely costly and time consuming to produce. The main alternative to hand animation is performance-driven animation using facial motion capture of an actor's face [Beeler et al 2011;Cao et al 2015Cao et al , 2013Fyfe et al 2014;Huang et al 2011;Li et al 2013;Weise et al 2011;Weng et al 2014;Zhang et al 2004]. Performance-driven animation requires an actor to perform all shots, and may generate animation parameters that are complex and time consuming for an animator to edit (e.g.…”

Section: Related Workmentioning

confidence: 99%

A deep learning approach for generalized speech animation

et al. 2017

View full text Add to dashboard Cite

Fig. 1. A machine learning approach is used to learn a regression function mapping phoneme labels to speech animation. Our approach generates continuous, natural-looking speech animation for a reference face parameterization that can be retargeted to the face of any computer generated character.We introduce a simple and efective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech. Our approach uses a sliding window predictor that learns arbitrary nonlinear mappings from phoneme label input sequences to mouth movements in a way that accurately captures natural motion and visual coarticulation efects. Our deep learning approach enjoys several attractive properties: it runs in real-time, requires minimal parameter tuning, generalizes well to novel input speech sequences, is easily edited to create stylized and emotional speech, and is compatible with existing animation retargeting approaches. One important focus of our work is to develop an efective approach for speech animation that can be easily integrated into existing production pipelines. We provide a detailed description of our end-to-end approach, including machine learning design decisions. Generalized speech animation results are demonstrated over a wide range of animation clips on a variety of characters and voices, including singing and foreign language input. Our approach can also generate on-demand speech animation in real-time from user speech input.

show abstract

Section: Related Workmentioning

confidence: 99%

A deep learning approach for generalized speech animation

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Recent efforts in this area (e.g. [Bickel et al 2007;Huang et al 2011]) have been focused on complementing marker-based systems with other types of capturing devices such as video cameras and/or 3D scanners to improve the resolution and details of reconstructed facial geometry. Marker-based motion capture, however, is expensive and cumbersome for 3D facial performance capture.…”

Section: Introductionmentioning

confidence: 99%

Automatic acquisition of high-fidelity facial performances using monocular videos

Shi¹,

et al. 2014

Self Cite

View full text Add to dashboard Cite

Figure 1: Our system automatically captures high-fidelity facial performances using Internet videos: (left) input video data; (middle) the captured facial performances; (right) facial editing results: wrinkle removal and facial geometry editing. AbstractThis paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features such as the nose tip and mouth corners across the entire sequence and then use the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation of the subject at each frame. We utilize per-pixel shading cues to add finescale surface details such as emerging or disappearing wrinkles and folds into large-scale facial deformation. At a final step, we iterate our reconstruction procedure on large-scale facial geometry and fine-scale facial details to further improve the accuracy of facial reconstruction. We have tested our system on monocular videos downloaded from the Internet, demonstrating its accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals. We show our system advances the state of the art in facial performance capture by comparing against alternative methods.

show abstract

“…The work of Alexander et al [2013] recently extended this approach to enable real-time rendering of highly detailed facial rigs. Structured light and laser scanners have also been used to acquire facial geometry at the wrinkle scale [Zhang et al 2004;Ma et al 2008;Li et al 2009;Huang et al 2011]. Similarly, the setup of [Beeler et al 2010;Beeler et al 2011] is capable of reconstructing fine-scale detail using multiple calibrated/synchronized DSLR cameras.…”

Section: Dynamic Modelingmentioning

confidence: 99%

Dynamic 3D avatar creation from hand-held video input

2015

View full text Add to dashboard Cite

EPFLFigure 1: Our system creates a fully rigged 3D avatar of the user from uncalibrated video input acquired with a cell-phone camera. The blendshape models of the reconstructed avatars are augmented with textures and dynamic detail maps, and can be animated in realtime. AbstractWe present a complete pipeline for creating fully rigged, personalized 3D facial avatars from hand-held video. Our system faithfully recovers facial expression dynamics of the user by adapting a blendshape template to an image sequence of recorded expressions using an optimization that integrates feature tracking, optical flow, and shape from shading. Fine-scale details such as wrinkles are captured separately in normal maps and ambient occlusion maps. From this user-and expression-specific data, we learn a regressor for on-the-fly detail synthesis during animation to enhance the perceptual realism of the avatars. Our system demonstrates that the use of appropriate reconstruction priors yields compelling face rigs even with a minimalistic acquisition system and limited user assistance. This facilitates a range of new applications in computer animation and consumer-level online communication based on personalized avatars. We present realtime application demos to validate our method.

show abstract

Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition

Cited by 48 publications

References 22 publications

A deep learning approach for generalized speech animation

A deep learning approach for generalized speech animation

Automatic acquisition of high-fidelity facial performances using monocular videos

Dynamic 3D avatar creation from hand-held video input

Contact Info

Product

Resources

About