2023
DOI: 10.48550/arxiv.2303.11089
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EmoTalk: Speech-driven emotional disentanglement for 3D face animation

Abstract: Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 46 publications
0
6
0
Order By: Relevance
“…For example, EAMM (Ji et al 2022) aims at generating one-shot emotional talking faces on arbitrary subjects, and it extract emotion patterns from the source video. Emotalk (Peng et al 2023) is a speechdriven 3D face animation method, while our approach can be applied in both video-driven and audio-driven. GC-AVT…”
Section: Emotion Editing In Talking Head Videosmentioning
confidence: 99%
“…For example, EAMM (Ji et al 2022) aims at generating one-shot emotional talking faces on arbitrary subjects, and it extract emotion patterns from the source video. Emotalk (Peng et al 2023) is a speechdriven 3D face animation method, while our approach can be applied in both video-driven and audio-driven. GC-AVT…”
Section: Emotion Editing In Talking Head Videosmentioning
confidence: 99%
“…When interacting with virtual characters, real-time generation is critical to providing a realistic and immersive user experience. This allows for direct and normal communication and presence, as achieved by [59,61,101,102] (Scenario 1). The computational complexity required to rapidly process and render realistic audio-visual input makes it difficult to achieve realtime talking head production [58,62,103] (Scenario 4).…”
Section: Representation Of Realismmentioning
confidence: 99%
“…Similarly, Xing et al [110] introduce the innovative CodeTalker method, which aims to generate realistic facial animations from speech signals, enhancing the reality of virtual characters. Further, Peng et al [102] consider emotional expressions as a means of using animations with a heightened sense of reality. In particular, Haque and Yumak [62] combine speech-driven facial expressions with enhanced realism, offering users a more convincing and authentic experience.…”
Section: Covered Criteria For Talking Head Implementationmentioning
confidence: 99%
“…To our knowledge, [Karras et al 2017;Peng et al 2023] addressed emotional expressiveness for audio-driven 3D facial animation synthesis task. Our goal is to explore and study the emotional expressiveness in speech-driven 3D facial animation synthesis in more detail and answer the research questions introduced in the previous section by proposing novel approaches for the synthesis task.…”
Section: Background and Related Workmentioning
confidence: 99%
“…However, vision-based 4D reconstruction models such as DECA [Feng et al 2021] and EMOCA [Danecek et al 2022] have gained traction in recent years for producing emotionally expressive 3D mesh sequences from videos. We have seen in [Ng et al 2022;Peng et al 2023], such vision-based models are used to create synthetic datasets using 2D videos. With EMOCA, we plan to employ a similar strategy to create our own synthetic dataset that will have labeled categories of emotion together with continuous valence and arousal information as depicted in Fig.…”
Section: Sub Rq3mentioning
confidence: 99%