Neural Face Models for Example-Based Visual Speech Synthesis

Paier, Wolfgang; Hilsmann, Anna; Eisert, Peter

doi:10.1145/3429341.3429356

“…As a result, humans are extremely sensitive to movement and rendering artefacts, which gives rise to the well-known uncanny valley in photo-realistic rendering of human appearance. Recently there has been significant progress using deep generative models to synthesise highly realistic images (Goodfellow et al, 2014;Kingma and Welling, 2013;Zhu et al, 2017;Isola et al, 2016;Ulyanov et al, 2016;Ma et al, 2017;Siarohin et al, 2017;Paier et al, 2020) and videos (Vondrick et al, 2016;Tulyakov et al, 2017) of scenes, which is important for applications such as image manipulation, video animation and rendering of virtual environments. Human avatars are typically rendered using detailed, explicit 3D models, which consist of meshes and textures, and animated using tailored motion models to simulate human behaviour and activity.…”

Section: Introductionmentioning

confidence: 99%

Deep4D: A Compact Generative Representation for Volumetric Video

Regateiro¹,

Volino²,

Hilton³

2021

Front. Virtual Real.

3

0

View full text Add to dashboard Cite

This paper introduces Deep4D a compact generative representation of shape and appearance from captured 4D volumetric video sequences of people. 4D volumetric video achieves highly realistic reproduction, replay and free-viewpoint rendering of actor performance from multiple view video acquisition systems. A deep generative network is trained on 4D video sequences of an actor performing multiple motions to learn a generative model of the dynamic shape and appearance. We demonstrate the proposed generative model can provide a compact encoded representation capable of high-quality synthesis of 4D volumetric video with two orders of magnitude compression. A variational encoder-decoder network is employed to learn an encoded latent space that maps from 3D skeletal pose to 4D shape and appearance. This enables high-quality 4D volumetric video synthesis to be driven by skeletal motion, including skeletal motion capture data. This encoded latent space supports the representation of multiple sequences with dynamic interpolation to transition between motions. Therefore we introduce Deep4D motion graphs, a direct application of the proposed generative representation. Deep4D motion graphs allow real-tiome interactive character animation whilst preserving the plausible realism of movement and appearance from the captured volumetric video. Deep4D motion graphs implicitly combine multiple captured motions from a unified representation for character animation from volumetric video, allowing novel character movements to be generated with dynamic shape and appearance detail.

show abstract

“…As a result, humans are extremely sensitive to movement and rendering artefacts, which gives rise to the well-known uncanny valley in photo-realistic rendering of human appearance. Recently there has been significant progress using deep generative models to synthesise highly realistic images (Goodfellow et al, 2014;Kingma and Welling, 2013;Zhu et al, 2017;Isola et al, 2016;Ulyanov et al, 2016;Ma et al, 2017;Siarohin et al, 2017;Paier et al, 2020) and videos (Vondrick et al, 2016;Tulyakov et al, 2017) of scenes, which is important for applications such as image manipulation, video animation and rendering of virtual environments. Human avatars are typically rendered using detailed, explicit 3D models, which consist of meshes and textures, and animated using tailored motion models to simulate human behaviour and activity.…”

Section: Introductionmentioning

confidence: 99%