2008
DOI: 10.1007/978-3-540-88190-2_9
|View full text |Cite
|
Sign up to set email alerts
|

Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…Given a speaker's audio information, the generation of the corresponding person speaking video has attracted many researchers' interests. Earlier works mainly used the Hidden Markov model (HMM) to generate corresponding relationships between speech and facial motions [9][10][11][12][13][14]. Among them, Brand [15] proposed voice puppetry as an HMMbased method for generating conversation faces driven only by voice signals.…”
Section: Related Workmentioning
confidence: 99%
“…Given a speaker's audio information, the generation of the corresponding person speaking video has attracted many researchers' interests. Earlier works mainly used the Hidden Markov model (HMM) to generate corresponding relationships between speech and facial motions [9][10][11][12][13][14]. Among them, Brand [15] proposed voice puppetry as an HMMbased method for generating conversation faces driven only by voice signals.…”
Section: Related Workmentioning
confidence: 99%
“…There exist a few approaches to speech-driven talking face generation. Early work in this field mostly used Hidden Markov Models (HMM) to model the correspondence between speech and facial movements [2,4,8,7,24,20,25]. One of the notable early work, Voice Puppetry [2], proposed an HMM-based talking face generation that is driven by only speech signal.…”
Section: Introductionmentioning
confidence: 99%
“…Choi et al [4] and Terissi et. al [20] used HMM inversion (HMMI) to estimate the visual parameters from speech. Zhang et al [25] used a DNN to map speech features into HMM states, which further maps to generated faces.…”
Section: Introductionmentioning
confidence: 99%
“…Many approaches rely on non-linear statistical models which are trained on corpora of audio-visual speech and learn a mapping from some acoustic parameterization to a corresponding visual parameterization. A popular approach is to use hidden Markov models (HMMs) [13][14][15][16][17][18], which have been widely used by the speech community for decades for both speech recognition and synthesis. Chen [14] trained HMMs on joint audio-visual features then separated the models for prediction.…”
Section: Introductionmentioning
confidence: 99%
“…For new speech, the visual HMM was sampled using the acoustic state sequence as derived from the Viterbi algorithm. Choi et al [15] and Terrissi and Gomez [16] also trained joint audio-visual HMMs but ´ used HMM inversion (HMMI) to infer the visual parameters. Xie et al [17] introduced coupled HMMs (CHMMs) to account for the asynchrony between audio and visual activity caused by coarticulation [19].…”
Section: Introductionmentioning
confidence: 99%