2019 International Conference on Multimodal Interaction 2019
DOI: 10.1145/3340555.3353725
|View full text |Cite
|
Sign up to set email alerts
|

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations

Abstract: Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar's speech and their body pose, but it also needs to model interpersonal dynamics with the interlocutor present in the conversa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 51 publications
(36 citation statements)
references
References 49 publications
0
33
0
Order By: Relevance
“…To obtain semantic information for the speech, we first transcribed the audio recordings using Google Cloud automatic speech recognition (ASR), followed by thorough manual review to correct recognition errors and add punctuation for both the training and test parts of the dataset. The same data was used by the GENEA 2020 gesture generation challenge 1 and has been made publicly available in the original dataset repository 2 .…”
Section: Training and Test Datamentioning
confidence: 99%
See 1 more Smart Citation
“…To obtain semantic information for the speech, we first transcribed the audio recordings using Google Cloud automatic speech recognition (ASR), followed by thorough manual review to correct recognition errors and add punctuation for both the training and test parts of the dataset. The same data was used by the GENEA 2020 gesture generation challenge 1 and has been made publicly available in the original dataset repository 2 .…”
Section: Training and Test Datamentioning
confidence: 99%
“…Those approaches are constrained by the discrete set of gestures they can produce. Alongside recent advances in deep learning, data-driven approaches have increasingly gained interest for gesture generation [1,27,48]. While early work has considered gesture generation as a classification task which aims to deduce a specified gesture class [9,37], more recent work has considered it as a regression task which aims to produce continuous motion [2,48].…”
mentioning
confidence: 99%
“…Our stimulus-generation method allowed manipulating speech articulation independently from the facial gestures, allowing for varying the facial gestures while controlling for the effect of speech context. We find that: (1) Evaluators can distinguish mimicry segments from mismatched segments (from the same interaction but another point in time) and find mimicry segments more appropriate. This also validates that our feature extraction and stimulus generation methods are appropriate for non-verbal behavior.…”
Section: Introductionmentioning
confidence: 85%
“…Our problem formulation is largely inspired by a recent method to model conversational dynamics for gesture generation [1]. Like in that work, we also model avatar behavior based on both the avatar's own speech and the speech and motion of the interlocutor.…”
Section: Interlocutor-aware Gesture Generationmentioning
confidence: 99%
See 1 more Smart Citation