2023
DOI: 10.48550/arxiv.2301.02379
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Abstract: Speech-driven 3D facial animation has been widely studied, yet there is still a gap to achieving realism and vividness due to the highly ill-posed nature and scarcity of audiovisual data. Existing works typically formulate the crossmodal mapping into a regression task, which suffers from the regression-to-mean problem leading to over-smoothed facial motions. In this paper, we propose to cast speechdriven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 52 publications
0
3
0
Order By: Relevance
“…Among the methods for discrete representation learning, the VQ-VAE approach (Van Den Oord, Vinyals et al 2017) has gained significant popularity for quantizing latent features into a learned codebook. Ng et al (2022) and Xing et al (2023) their approaches, we explore a multi-task VQ-VAE to extract the speaking style with the help of a learned codebook.…”
Section: Discrete Representation Learningmentioning
confidence: 99%
“…Among the methods for discrete representation learning, the VQ-VAE approach (Van Den Oord, Vinyals et al 2017) has gained significant popularity for quantizing latent features into a learned codebook. Ng et al (2022) and Xing et al (2023) their approaches, we explore a multi-task VQ-VAE to extract the speaking style with the help of a learned codebook.…”
Section: Discrete Representation Learningmentioning
confidence: 99%
“…3D face reconstruction from 2D images has received a tremendous amount of attention in computer vision and has made major progresses thanks to the highly accurate modeling capability of deep learning. 3D face reconstruction enables a wide range of applications such as speech-driven 3D facial animation, 3D avatar generation, virtual makeup, performance capture, virtual and augmented reality, and human-robot interaction [2][3][4][5][6][7].…”
Section: Introductionmentioning
confidence: 99%
“…3D face reconstruction from 2D images has received a tremendous amount of attention in computer vision and has made major progresses thanks to the highly accurate modeling capability of deep learning. 3D face reconstruction enables a wide range of applications such as speechdriven 3D facial animation, 3D avatar generation, virtual makeup, performance capture, virtual and augmented reality, and human-robot interaction [2][3][4][5][6][7].…”
Section: Introductionmentioning
confidence: 99%