2002
DOI: 10.1007/3-540-45683-x_60
|View full text |Cite
|
Sign up to set email alerts
|

Audio-to-Visual Conversion Using Hidden Markov Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
42
0
2

Year Published

2010
2010
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(44 citation statements)
references
References 9 publications
0
42
0
2
Order By: Relevance
“…We use the concept of a viseme which is the lip shape during the voicing of a phoneme. Here, we use a phoneme to viseme mapping resulting in 14 visemes from the 42 English phonemes [10], [12]. Data corresponding to each viseme are grouped using the available phoneme-level transcriptions.…”
Section: Face Modalitymentioning
confidence: 99%
“…We use the concept of a viseme which is the lip shape during the voicing of a phoneme. Here, we use a phoneme to viseme mapping resulting in 14 visemes from the 42 English phonemes [10], [12]. Data corresponding to each viseme are grouped using the available phoneme-level transcriptions.…”
Section: Face Modalitymentioning
confidence: 99%
“…This reduction in the number of units reduces the number of possible bigrams of units by about 85% compared with using phone units, and hence reduces VLID performance. Our experience using visemes that were defined by the mapping described in [34] showed that performance was limited, and analysis of this mapping showed that it was highly over-simplified. Although we know that there are several phonemes that cannot be discriminated visually (for instance, it is impossible to detect voicing visually, or place of articulation when this is far back inside the oral cavity), we have found that VLID accuracy is enhanced by training models using video segments corresponding to 42 audio phonemes.…”
Section: B Visual Models Of Phonemesmentioning
confidence: 99%
“…As previously, for isolated words, the Disney vowels are significantly worse than all others when paired with all consonant difference over the whole group. The Lee [71], Montgomery [67] and Bozkurt [69] vowels are consistently above the mean and above the upper error bar for Disney [66], Jeffers [70] and Hazen [19] vowels. In comparing the consonants, Lee [71] and Hazen [19] are the best whereas Woodward [75] and Franks [62] are the bottom performers.…”
Section: Comparison Of Current Phoneme-to-viseme Mapsmentioning
confidence: 95%
“…Data-driven methods are most recent, e.g. Lee's [71] visemes were presented in 2002 and Hazen's [19] in 2004. The remaining visemes are based around linguistic/phonemic rules.…”
Section: Comparison Of Phoneme-to-viseme Mappingsmentioning
confidence: 99%
See 1 more Smart Citation