Proceedings of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3382507.3421155
|View full text |Cite
|
Sign up to set email alerts
|

Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents

Abstract: Figure 1: A deep learning approach is used to generate upper-face gestures and is trained using facial gestures, audio features, and speech text extracted from TED talks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…and Convolutional Neural Networks (CNNs) combined with GANs [16] were used to pblackict speech-driven head motion, facial, hand and body gestures. Other works driven by both speech and text semantics were proposed for upper-facial [11], [9], [10] and hand [19] gesture generation, however they cannot be used for style modelling and control. Synthesizing expressive gestures while controlling their style is recently receiving more attention [3], [17], [8], [14].…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…and Convolutional Neural Networks (CNNs) combined with GANs [16] were used to pblackict speech-driven head motion, facial, hand and body gestures. Other works driven by both speech and text semantics were proposed for upper-facial [11], [9], [10] and hand [19] gesture generation, however they cannot be used for style modelling and control. Synthesizing expressive gestures while controlling their style is recently receiving more attention [3], [17], [8], [14].…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
“…propose a probabilistic approach based on normalizing flows for synthesizing facial gestures in dyadic settings. Facial (Fares et al [2021b], Fares [2020]) and hand ) gestures driven by both acoustic and semantic information are the closest approaches to our gesture generation task, however they cannot be used for the style transfer task.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, transformer networks and attention mechanisms were recently used for upper-facial gesture synthesis based on multimodal data—text and speech (Fares et al, 2021b ). Facial (Fares, 2020 ; Fares et al, 2021b ) and hand (Kucherenko et al, 2020 ) gestures driven by both acoustic and semantic information are the closest approaches to our gesture generation task; however, they cannot be used for the style transfer task.…”
Section: Related Workmentioning
confidence: 99%
“…LSTM networks driven by speech were recently used to predict sequences of gestures (Hasegawa et al, 2018) and body motions (Shlizerman et al, 2018;Ahuja et al, 2019). LSTMs were additionally employed for synthesizing sequences of facial gestures driven by text and speech, namely, the fundamental frequency (F0) (Fares, 2020;Fares et al, 2021a). Generative adversarial networks (GANs) were proposed to generate realistic head motion (Sadoughi and Busso, 2018) and body motions (Ferstl et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation