SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents

Yoon, Youngwoo; Park, Keunwoo; Jang, Minsu; Kim, Jae Hong; Lee, Geehyuk

doi:10.1145/3472749.3474789

Cited by 17 publications

(8 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors employ a probabilistic model that predicts the next pose distribution instead of predicting a fixed pose; gesture motion can then be re‐sampled repeatedly to obtain a variety of sequences. Similarly, in [YPJ*21] a gesture generation toolkit is presented with the control parameters speed, spacial extent, and handedness. The system in [SGD21] uses the Laban Effort and Shape qualities as animation modifiers to impart the intended personality to the character.…”

Section: Related Workmentioning

confidence: 99%

“…Previous works have sought to address the problem of creating distinct styles by modelling and generating gestures for specific speakers [NKAS08, GBK*19, YCL*20, ALNM20] and by modifying gesture motion through general statistics such as hand height and velocity [AHKB20, YPJ*21]. These approaches lack flexibility because they are limited by the content of the training data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech

Ghorbani

Ferstl

Holden

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles. Our code and data are publicly available at https:// github.com/ ubisoft/ ubisoft-laforge-ZeroEGGS.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech

Ghorbani

Ferstl

Holden

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…deictic gestures for a lecturer in front of display) from gesture style differences between speakers [ALNM20]. Yoon et al [YPJ*21] recently proposed an innovative approach to this challenge: an authoring toolkit that balances gesture quality and authoring effort. The toolkit combines automatic gesture generation using a GAN‐based generative model [YCL*20] and manual controls.…”

Section: Key Challenges Of Gesture Generationmentioning

confidence: 99%

A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation

Nyatsanga

Kucherenko²,

Ahuja³

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

show abstract

“…In addition to the objective evaluation, we conducted a subjective evaluation in that human participants rate the gesture motion videos of a virtual character. We followed the evaluation scheme introduced in GENEA Challenge 2020 [18] and was used in related studies [33], [12]. The evaluation scheme consists of two studies that measure human-likeness of generated motion and appropriateness of motion to the input speech.…”

Section: Subjective Evaluationsmentioning

confidence: 99%

Co-Speech Gesture Synthesis using Discrete Gesture Token Learning

Lu¹,

Yoon²,

Feng³

2023

Preprint

View full text Add to dashboard Cite

Synthesizing realistic co-speech gestures is an important and yet unsolved problem for creating believable motions that can drive a humanoid robot to interact and communicate with human users. Such capability will improve the impressions of the robots by human users and will find applications in education, training, and medical services. One challenge in learning the co-speech gesture model is that there may be multiple viable gesture motions for the same speech utterance. The deterministic regression methods can not resolve the conflicting samples and may produce over-smoothed or damped motions. We proposed a two-stage model to address this uncertainty issue in gesture synthesis by modeling the gesture segments as discrete latent codes. Our method utilizes RQ-VAE in the first stage to learn a discrete codebook consisting of gesture tokens from training data. In the second stage, a two-level autoregressive transformer model is used to learn the prior distribution of residual codes conditioned on input speech context. Since the inference is formulated as token sampling, multiple gesture sequences could be generated given the same speech input using top-k sampling. The quantitative results and the user study showed the proposed method outperforms the previous methods and is able to generate realistic and diverse gesture motions.

show abstract

SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents

Cited by 17 publications

References 32 publications

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech

ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech

A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation

Co-Speech Gesture Synthesis using Discrete Gesture Token Learning

Contact Info

Product

Resources

About