2023
DOI: 10.48550/arxiv.2303.14613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

Abstract: a user study, we show that our system outperforms the state-of-the-art approaches regarding human likeness, appropriateness, and style correctness. A demonstration of our system can be found here 1 . CCS Concepts: • Computing methodologies → Animation; Natural language processing; Neural networks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 54 publications
0
12
0
Order By: Relevance
“…To compute this metric, we utilize the autoencoder pre-trained by BEAT (Liu et al 2022a). (Yoon et al 2019;Ginosar et al 2019;Yoon et al 2020;Li et al 2021a;Liu et al 2022a;Yi et al 2022;Ao, Zhang, and Liu 2023) in the term of FGD, SRGR and BeatAlign. All methods are trained on BEAT datasets.…”
Section: Evaluation Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…To compute this metric, we utilize the autoencoder pre-trained by BEAT (Liu et al 2022a). (Yoon et al 2019;Ginosar et al 2019;Yoon et al 2020;Li et al 2021a;Liu et al 2022a;Yi et al 2022;Ao, Zhang, and Liu 2023) in the term of FGD, SRGR and BeatAlign. All methods are trained on BEAT datasets.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…While substantial progress has been made in generating gestures synchronized to audio (Ginosar et al 2019;Qian et al 2021;Yazdian, Chen, and Lim 2022;Yang et al 2023d;Ao, Zhang, and Liu 2023;Yang et al 2023a), there has been limited exploration of emotive gesture generation that Figure 1: Performance comparison with limited modal during inference. The performance of existing multimodal methods is significantly hindered by the inadequate incorporation of multiple modalities during the inference stage.…”
Section: Introductionmentioning
confidence: 99%
“…LDA [4] enables the system to generate style gestures with classifierfree guidance. Additionally, recent research has explored using textual prompts to generate stylized gestures [16]. Given that human emotions are more accurately represented on a continuous spectrum [32] [33] and emerge from a complex interplay of fuzzy factors, depending on discrete emotion labels can overly simplify the gesture generation process.…”
Section: A Condition Extraction Mechanismmentioning
confidence: 99%
“…For the Trinity dataset, we employed LDA [4] and Taming [15]. In addition to LDA and Taming, for the ZEGGS dataset, we also incorporated DiffuseStyleGesture (DSG) [2] and Ze-roEGGS [7] Furthermore, for the BEAT dataset, we utilized the same baseline models as in ZEGGS but replaced DSG with DSG+ [3] and introduced GestureDiffuCLIP (GDC) [16] as an additional baseline model.…”
Section: Subjective and Objective Evaluationmentioning
confidence: 99%
See 1 more Smart Citation