2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) 2020
DOI: 10.1109/icarcv50220.2020.9305330
|View full text |Cite
|
Sign up to set email alerts
|

SRG3: Speech-driven Robot Gesture Generation with GAN

Abstract: Fig. 1: Our robot gesture generation pipeline. The speech audio is used as an input to the 3D pose synthesizer with a random noise. Then, the natural and human-like 3D gesture sequences (in the joint position(x, y, z) space) are generated. Moreover, one same speech audio with multiple different random noises can align multiple natural gesture expressions in the same manner as humans have similar but different gestures while expressing the same speech in different contexts and situations. The gesture retargetin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…Then, a training step of objective numerical errors is performed. Also in 2020, Yu et al [206] proposed a speech-driven generation method that maps a representation of speech audio to a set of appropriate gestures using a Generative Adversarial Network architecture. It has the advantage of being able to generate multiple gesture patterns for a single speech input using random noises.…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Then, a training step of objective numerical errors is performed. Also in 2020, Yu et al [206] proposed a speech-driven generation method that maps a representation of speech audio to a set of appropriate gestures using a Generative Adversarial Network architecture. It has the advantage of being able to generate multiple gesture patterns for a single speech input using random noises.…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
“…Other approaches include the use of prosodic features, as shown by the work of Chiu et al [190], or directly encoding the audio signal. This last method appears in the approaches presented by Yu et al [206] or Ahuja et al [200]. Some works combine both audio features and text to improve the obtained results.…”
Section: Multimodalitymentioning
confidence: 99%
See 3 more Smart Citations