2002
DOI: 10.1109/tnn.2002.1021891
|View full text |Cite
|
Sign up to set email alerts
|

An HMM-based speech-to-video synthesizer

Abstract: Emerging broadband communication systems promise a future of multimedia telephony, e.g. the addition of visual information to telephone conversations. It is useful to consider the problem of generating the critical information useful for speechreading, based on existing narrowband communications systems used for speech. This paper focuses on the problem of synthesizing visual articulatory movements given the acoustic speech signal. In this application, the acoustic speech signal is analyzed and the correspondi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2005
2005
2018
2018

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(16 citation statements)
references
References 34 publications
0
16
0
Order By: Relevance
“…The speech signal was processed in frames of 25ms with a 15ms overlapping (rate=lOOHz). The speech frames were pre-emphasized with an FIR filter (H(z) 1 1-az-1,a a 0.97), and weighted by a Hamming window to avoid spectral distortions. After pre-processing, we exacted Mel Frequency Cepstral Coefficients (MFCCs) as the acoustic features.…”
Section: Av Signal Processingmentioning
confidence: 99%
See 2 more Smart Citations
“…The speech signal was processed in frames of 25ms with a 15ms overlapping (rate=lOOHz). The speech frames were pre-emphasized with an FIR filter (H(z) 1 1-az-1,a a 0.97), and weighted by a Hamming window to avoid spectral distortions. After pre-processing, we exacted Mel Frequency Cepstral Coefficients (MFCCs) as the acoustic features.…”
Section: Av Signal Processingmentioning
confidence: 99%
“…Although hearing impaired people are the ideal subjects for measuring the perceptual quality of the synthesized visual speech, it is quite difficult to get an impartial evaluation because we have to find individuals with equal levels of lipreading proficiency [1]. Therefore, we used a substitution evaluation approach, in which a group of 8 subjects with normal hearing were recruited.…”
Section: B Subjective Evaluations Via Human Lipreadingmentioning
confidence: 99%
See 1 more Smart Citation
“…The conversion problem is treated as one of finding the best approximation from given sets of training data. These approaches were briefly discussed in Chen and Rao [10], including vector quantization [25], Hidden Markov Models (HMM) [2,3,9,13,31], and neural networks [19,20,30]. However, the speech-driven systems were generally made to be user-independent for satisfactory average performance, which means a decrease in accuracy rate for a specific user.…”
Section: Introductionmentioning
confidence: 99%
“…Once the sequence of mouth movements has been determined, the mouth is mapped back to a background face of the speaker. Other authors have proposed methods based on modeling of phonemes by correlational HMM's [5] or neural networks [6].…”
Section: Introductionmentioning
confidence: 99%