Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-364
|View full text |Cite
|
Sign up to set email alerts
|

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 13 publications
0
6
0
2
Order By: Relevance
“…The study compared the results achieved with their model with a HMM-based approach, showing improvements in terms of objective and subjective metrics. Li et al [6] proposed strategies using BLSTM models to create emotional facial movement by having access to a small emotional dataset. They proposed several approaches to leverage recordings from a neutral corpus and a small emotional corpus, aiming to improve the emotional regression result.…”
Section: Dnn-based Modelingmentioning
confidence: 99%
“…The study compared the results achieved with their model with a HMM-based approach, showing improvements in terms of objective and subjective metrics. Li et al [6] proposed strategies using BLSTM models to create emotional facial movement by having access to a small emotional dataset. They proposed several approaches to leverage recordings from a neutral corpus and a small emotional corpus, aiming to improve the emotional regression result.…”
Section: Dnn-based Modelingmentioning
confidence: 99%
“…Li et al [19] used a recurrent network and compared several BLSTM architectures to adapt a model trained on a large neutral corpus with a small quantity of expressive data. The five proposed systems generate expressive visual animations from audio files.…”
Section: Related Workmentioning
confidence: 99%
“…One way to overcome this limitation is by taking advantage of the neutral data available and to link it with the emotional data. For instance, Li et al [19] used recurrent network (DBLSTM) to generate audiovisual animation from audio by simply retraining the model with emotion-specific data. Their experiments showed that using neutral corpus can improve the performance of the synthesis of expressive talking avatar animations.…”
Section: Introductionmentioning
confidence: 99%
“…Some methods for automatic expressive 3D character animation have emerged taking advantages of the progress in the deep learning area. Xu Li et al [7] used recurrent network (DBLSTM) to generate audiovisual animation from audio by simply retraining the model with emotion-specific data. Their experiments showed that using neutral corpus can improve the performance of expressive talking avatar generation.…”
Section: Introductionmentioning
confidence: 99%