2022
DOI: 10.1145/3550454.3555435
|View full text |Cite
|
Sign up to set email alerts
|

Rhythmic Gesticulator

Abstract: Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 73 publications
(26 citation statements)
references
References 50 publications
0
26
0
Order By: Relevance
“…Due to the inherent many-to-many relationship between speech and gesture, end-to-end models can generate natural-looking gestures but face challenges in ensuring content matching between speech and generated gestures [Yoon et al 2022]. To address this issue, some neural systems aim to explicitly model both rhythm and semantics from the perspective of model structure [Ao et al 2022;Kucherenko et al 2021;Liu et al 2022a] or training supervision strategy ]. Furthermore, hybrid systems, such as the combination of deep features and motion graphs [Zhou et al 2022], have been proposed to harness the advantages of different approaches.…”
Section: Related Work 21 Co-speech Gesture Synthesismentioning
confidence: 99%
See 3 more Smart Citations
“…Due to the inherent many-to-many relationship between speech and gesture, end-to-end models can generate natural-looking gestures but face challenges in ensuring content matching between speech and generated gestures [Yoon et al 2022]. To address this issue, some neural systems aim to explicitly model both rhythm and semantics from the perspective of model structure [Ao et al 2022;Kucherenko et al 2021;Liu et al 2022a] or training supervision strategy ]. Furthermore, hybrid systems, such as the combination of deep features and motion graphs [Zhou et al 2022], have been proposed to harness the advantages of different approaches.…”
Section: Related Work 21 Co-speech Gesture Synthesismentioning
confidence: 99%
“…This raw motion representation, however, often contains redundant information. Following recent successful systems [Ao et al 2022;Dhariwal et al 2020;Rombach et al 2022], we learn a compact motion representation using VQ-VAE [van den Oord et al 2017] to ensure motion quality and diversity.…”
Section: Motion Representationmentioning
confidence: 99%
See 2 more Smart Citations
“…In recent years, the compelling performance of deep neural networks has prompted datadriven approaches. Previous studies establish large-scale speech-gesture corpus to learn the mapping from speech audio to human skeletons in an end-to-end manner [4,5,25,27,30,34,39]. To attain more expressive results, Ginosar et al [16] and Yoon et al [41] propose GAN-based methods to guarantee realism by adversarial mechanism, where the discriminator is trained to distinguish real gestures from the synthetic ones while the generator's objective is to fool the discriminator.…”
Section: Introductionmentioning
confidence: 99%