2021
DOI: 10.1007/s11704-020-0133-7
|View full text |Cite
|
Sign up to set email alerts
|

Speech-driven facial animation with spectral gathering and temporal attention

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 48 publications
0
6
0
Order By: Relevance
“…Closer to our approach, a number of works produce 3D animations directly from speech [KAL*17; PWP18; TPL*20; RZW*21; CWWZ22]. Using formants as sound representation, Karras et al [KAL*17] achieves impressive results from less than 4 minutes of training data.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Closer to our approach, a number of works produce 3D animations directly from speech [KAL*17; PWP18; TPL*20; RZW*21; CWWZ22]. Using formants as sound representation, Karras et al [KAL*17] achieves impressive results from less than 4 minutes of training data.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, Chai et al [CWWZ22] gathers information along the frequency dimension of a speech window with a stack of convolutions, but uses self‐attention layers to collect information along the time dimension. Similar to Cudeiro et al [CBL*19], their model takes speaker identity as auxiliary input and is thus able to explicitly model different speaking styles.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…There are several methods [10][11][12] to obtain 3D facial parameter representations from 2D monocular videos, but the quality of the synthesized 3D data receives limitations in the accuracy of 3D reconstruction techniques and 3D reconstruction techniques cannot realize subtle changes in 3D based on 2D videos, so this may lead to unreliable results. In works that generate 3D facial animations based on 3D meshes [13][14][15], they delay the speech input to short audio windows, which may lead to pauses in lip movements with speech changes, which further may affect the realistic facial changes.…”
Section: Introductionmentioning
confidence: 99%