Realistic Speech-Driven Talking Video Generation with Personalized Pose

Zhang, Xu; Weng, Liguo

doi:10.1155/2020/6629634

Cited by 3 publications

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Wav2Lip‐HR: Synthesising clear high‐resolution talking head in the wild

Liang,

Wang,

Chen

et al. 2023

Computer Animation & Virtual

View full text Add to dashboard Cite

Talking head generation aims to synthesize a photo‐realistic speaking video with accurate lip motion. While this field has attracted more attention in recent audio‐visual researches, most existing methods do not achieve the simultaneous improvement of lip synchronization and visual quality. In this paper, we propose Wav2Lip‐HR, a neural‐based audio‐driven high‐resolution talking head generation method. With our technique, all required to generate a clear high‐resolution lip sync talking video is an image/video of the target face and an audio clip of any speech. The primary benefit of our method is that it generates clear high‐resolution videos with sufficient facial details, rather than the ones just be large‐sized with less clarity. We first analyze key factors that limit the clarity of generated videos and then put forth several important solutions to address the problem, including data augmentation, model structure improvement and a more effective loss function. Finally, we employ several efficient metrics to evaluate the clarity of images generated by our proposed approach as well as several widely used metrics to evaluate lip‐sync performance. Numerous experiments demonstrate that our method has superior performance on visual quality and lip synchronization when compared to other existing schemes.

show abstract

Wav2Lip‐HR: Synthesising clear high‐resolution talking head in the wild

Liang,

Wang,

Chen

et al. 2023

Computer Animation & Virtual

View full text Add to dashboard Cite

show abstract

Facial and Neck Region Analysis for Deepfake Detection Using Remote Photoplethysmography Signal Similarity

An,

Lim,

Seong

et al. 2024

IET Biometrics

View full text Add to dashboard Cite

Deepfake (DF) involves utilizing artificial intelligence (AI) technology to synthesize or manipulate images, voices, and other human or object data. However, recent times have seen a surge in instances of DF technology misuse, raising concerns about cybercrime and the credibility of manipulated information. The objective of this study is to devise a method that employs remote photoplethysmography (rPPG) biosignals for DF detection. The face was divided into five regions based on landmarks, with automatic extraction performed on the neck region. We conducted rPPG signal extraction from each facial area and the neck region was defined as the ground truth. The five signals extracted from the face were used as inputs to an support vector machine (SVM) model by calculating the euclidean distance between each signal and the signal extracted from the neck region, measuring rPPG signal similarity with five features. Our approach demonstrated robust performance with an area under the curve (AUC) score of 91.2% on the audio‐driven dataset and 99.7% on the face swapping generative adversarial network (FSGAN) dataset, even though we only used datasets excluding DF techniques that can be visually identified in Korean DF Detection Dataset (KoDF). Therefore, our research findings demonstrate that similarity features of rPPG signals can be utilized as key features for detecting DFs.

show abstract