Thin-Plate Spline Motion Model for Image Animation

Zhao, Jian; Zhang, Hui

doi:10.1109/cvpr52688.2022.00364

Cited by 99 publications

(37 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A key focus area of such works is to design appropriate motion representations for animation [35,52,55,66]. A number of improved representations have been proposed, such as those setting additional constraints on a kinematic tree [59], and thin-plate spline motion modelling [81]. A further work, titled Latent Image Animator [64], learned a latent space for possible motions.…”

Section: Related Workmentioning

confidence: 99%

“…Initially, such transformations were modeled using a simple set of sparse keypoints. Further works improved the motion representation [52,55], learned latent motion dictionaries [64], kinematic chains [59] or used thin-plate spline transformations [81]. However, broadly speaking, all such works propose 2D motion representations, warping the pixels or features of the input image such that they correspond to the pose of a given driving image.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Volumetric Animation

Siarohin¹,

Menapace²,

Skorokhodov³

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Unsupervised Volumetric Animation

Siarohin¹,

Menapace²,

Skorokhodov³

et al. 2023

Preprint

View full text Add to dashboard Cite

“…MetaPortrait [6] introduces an ID-preserving talking head generation framework that leverages dense landmarks for accurate geometry-aware flow fields and adaptively fuses source identity during synthesis for better preservation of key characteristics. Besides these third-party model-based methods, some video-driven talking head generation methods [9], [10], [11], [12], [44] attempt to learn keypoints of the human face to represent the facial expression in a self-supervised manner. FOMM [9] introduces a self-supervised image animation framework that decouples appearance and motion information, and computes the motion between two faces by using their keypoints.…”

Section: Talking Head Synthesismentioning

confidence: 99%

“…Driving FOMM [9] DaGAN [12] TPSM [44] Ours Fig. 6: Qualitative comparisons on the self-reenactment experiment on the VoxCeleb1 dataset [56].…”

Section: Sourcementioning

confidence: 99%

See 1 more Smart Citation

Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Hong

Zhang²,

Shen³

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Predominant techniques on talking head generation largely depend on 2D information, including facial appearances and motions from input face images. Nevertheless, dense 3D facial geometry, such as pixel-wise depth, plays a critical role in constructing accurate 3D facial structures and suppressing complex background noises for generation. However, dense 3D annotations for facial videos is prohibitively costly to obtain. In this work, firstly, we present a novel self-supervised method for learning dense 3D facial geometry (i.e., depth) from face videos, without requiring camera parameters and 3D geometry annotations in training. We further propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning. Secondly, we design an effective geometry-guided facial keypoint estimation module, providing accurate keypoints for generating motion fields. Lastly, we develop a 3D-aware cross-modal (i.e., appearance and depth) attention mechanism, which can be applied to each generation layer, to capture facial geometries in a coarse-to-fine manner. Extensive experiments are conducted on three challenging benchmarks (i.e., VoxCeleb1, VoxCeleb2, and HDTF). The results demonstrate that our proposed framework can generate highly realistic-looking reenacted talking videos, with new state-of-the-art performances established on these benchmarks. The codes and trained models are publicly available on the GitHub project page.

show abstract

Wav2Lip‐HR: Synthesising clear high‐resolution talking head in the wild

Liang,

Wang,

Chen

et al. 2023

Computer Animation & Virtual

View full text Add to dashboard Cite

Talking head generation aims to synthesize a photo‐realistic speaking video with accurate lip motion. While this field has attracted more attention in recent audio‐visual researches, most existing methods do not achieve the simultaneous improvement of lip synchronization and visual quality. In this paper, we propose Wav2Lip‐HR, a neural‐based audio‐driven high‐resolution talking head generation method. With our technique, all required to generate a clear high‐resolution lip sync talking video is an image/video of the target face and an audio clip of any speech. The primary benefit of our method is that it generates clear high‐resolution videos with sufficient facial details, rather than the ones just be large‐sized with less clarity. We first analyze key factors that limit the clarity of generated videos and then put forth several important solutions to address the problem, including data augmentation, model structure improvement and a more effective loss function. Finally, we employ several efficient metrics to evaluate the clarity of images generated by our proposed approach as well as several widely used metrics to evaluate lip‐sync performance. Numerous experiments demonstrate that our method has superior performance on visual quality and lip synchronization when compared to other existing schemes.

show abstract

Thin-Plate Spline Motion Model for Image Animation

Cited by 99 publications

References 22 publications

Unsupervised Volumetric Animation

Unsupervised Volumetric Animation

Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Wav2Lip‐HR: Synthesising clear high‐resolution talking head in the wild

Contact Info

Product

Resources

About