2019
DOI: 10.1609/aaai.v33i01.33019299
|View full text |Cite
|
Sign up to set email alerts
|

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Abstract: Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
290
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 394 publications
(291 citation statements)
references
References 66 publications
1
290
0
Order By: Relevance
“…Algorithm PSNR SSIM LMD Chung et al [8] 28.06 0.460 2.22 Zhou et al [35] 26.80 0.884 -LipGAN (Ours) 33.4 0.960 0.60 Table 5: Our proposed LipGAN model achieves significant improvements over existing competitive approaches across all standard quantitative metrics.…”
Section: Quantitative Evaluationmentioning
confidence: 93%
See 4 more Smart Citations
“…Algorithm PSNR SSIM LMD Chung et al [8] 28.06 0.460 2.22 Zhou et al [35] 26.80 0.884 -LipGAN (Ours) 33.4 0.960 0.60 Table 5: Our proposed LipGAN model achieves significant improvements over existing competitive approaches across all standard quantitative metrics.…”
Section: Quantitative Evaluationmentioning
confidence: 93%
“…This leads them to use a simple fully convolutional encoder-decoder model. Even more recently, a different solution to the problem was proposed by Zhou et al [35], in which they use audio-visual speech recognition as a probe task for associating audio-visual representations, and then employ adversarial learning to disentangle the subject-related and speech-related information inside them. However, we observed two major limitations in their work.…”
Section: Talking Face Synthesis From Audiomentioning
confidence: 99%
See 3 more Smart Citations