2020
DOI: 10.1007/978-3-030-58523-5_15
|View full text |Cite
|
Sign up to set email alerts
|

Style Transfer for Co-speech Gesture Animation: A Multi-speaker Conditional-Mixture Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
69
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 62 publications
(74 citation statements)
references
References 38 publications
2
69
0
1
Order By: Relevance
“…An adversarial discriminator is used to evaluate if the generated motions match the style of the speaker. The method for generating co-speech gestures proposed by Ahuja et al [200] in 2020 also aimed at reflecting the gestural style of different speakers, while maintaining the content of the gesture. The style of the speaker is encoded in a 2D space, and defined by the idiosyncrasy of each speaker and a series of contextual circumstances, like the orientation of the body, or the posture (standing versus sitting, for example).…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…An adversarial discriminator is used to evaluate if the generated motions match the style of the speaker. The method for generating co-speech gestures proposed by Ahuja et al [200] in 2020 also aimed at reflecting the gestural style of different speakers, while maintaining the content of the gesture. The style of the speaker is encoded in a 2D space, and defined by the idiosyncrasy of each speaker and a series of contextual circumstances, like the orientation of the body, or the posture (standing versus sitting, for example).…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
“…Other approaches include the use of prosodic features, as shown by the work of Chiu et al [190], or directly encoding the audio signal. This last method appears in the approaches presented by Yu et al [206] or Ahuja et al [200]. Some works combine both audio features and text to improve the obtained results.…”
Section: Multimodalitymentioning
confidence: 99%
See 1 more Smart Citation
“…Style control vectors were inputted to the neural network model, and it was demonstrated that the model generates gestures following the inputted style intents. Also, speaker identities were used to generate stylized gestures reflecting inter-person variability [1,44]. A style embedding space was trained from a large corpus of gestures of different speakers in [44].…”
Section: Controllable Gesture Generationmentioning
confidence: 99%
“…A style embedding space was trained from a large corpus of gestures of different speakers in [44]. A style disentanglement was studied in [1].…”
Section: Controllable Gesture Generationmentioning
confidence: 99%