Neural Head Reenactment with Latent Pose Descriptors

Burkov, Egor; Pasechnik, Igor; Grigorev, Artur; Lempitsky, Victor

doi:10.1109/cvpr42600.2020.01380

Cited by 109 publications

(87 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…, I (K) } by encoder E n . Notably, data augmentation is also introduced in [6] for learning face reenactment. Different from their goal, our derivation of this space is to assist better feature learning and further representation modularization.…”

Section: Identifying Non-identity Feature Spacementioning

confidence: 99%

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Zhou

Liu

et al. 2019

AAAI

395

274

View full text Add to dashboard Cite

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.

show abstract

Section: Identifying Non-identity Feature Spacementioning

confidence: 99%

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Zhou

Liu

et al. 2019

AAAI

395

274

View full text Add to dashboard Cite

show abstract

“…However, dedicated device setup and heavily manual work are always needed for generating a realistic avatar and reconstructing the detailed appearance, subtle expressions, and gaze movement of a subject. Recent deep-learning based methods [6,12,30,32,58,78,79,82,85,86,88] avoid 3D avatar modeling and directly synthesize a talking head video of a subject from one source image of the subject and a video sequence. Elgharib et al [18] developed a solution for warping the video of a subject's face from side view to front view.…”

Section: Free Viewpoint Video Of Human Charactersmentioning

confidence: 99%

VirtualCube: An Immersive 3D Video Communication System

Zhang¹,

Yang²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Fellow, IEEE (a) (b) (c) Fig. 1. Snapshots of the VirtualCube system in action, with the local participant in the foreground. The images of remote participants on the screen are synthesized from the RGBD data acquired by cameras. (a) A face-to-face meeting with two participants. (b) A round-table meeting with multiple participants, each in a different location. No two participants are in the same location. (c) A side-by-side meeting that includes sharing work items on the participants' screens, as if the participants were sitting next to each other working together. Our system achieves mutual eye contact and visual attention as in in-person meetings. The lively recorded videos can be found on the project page.

show abstract

“…However, existing datasets collected from unlisted sources online remain unorganized and noisy, narrowing their applicability for developing data-driven models. For example, previous work on head reenactment [7,17,32,37,41,42] in recent years requires two frames extracted from the same video for training, which is infeasible using existing animation datasets.…”

Section: Related Workmentioning

confidence: 99%

AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment

Kim¹,

Park²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite remarkable success in deep learning-based facerelated models, these models are still limited to the domain of real human faces. On the other hand, the domain of animation faces has been studied less intensively due to the absence of a well-organized dataset. In this paper, we present a large-scale animation celebfaces dataset (Anime-Celeb) via controllable synthetic animation models to boost research on the animation face domain. To facilitate the data generation process, we build a semi-automatic pipeline based on an open 3D software and a developed annotation system. This leads to constructing a large-scale animation face dataset that includes multi-pose and multi-style animation faces with rich annotations. Experiments suggest that our dataset is applicable to various animation-related tasks * These authors contributed equally. such as head reenactment and colorization.

show abstract

Neural Head Reenactment with Latent Pose Descriptors

Cited by 109 publications

References 25 publications

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

VirtualCube: An Immersive 3D Video Communication System

AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment

Contact Info

Product

Resources

About