Lip synchronization in 3-D model based coding for video-conferencing

Provine, Joseph A.; Bruton, L.T.

doi:10.1109/iscas.1995.521548

Cited by 14 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SynNar is similar to the Actors system [2] in that it does not rely on any form of 3D modeling or rendering (as in [3] for example) to create the visual appearance of the ''talking head''. Rather, it uses morphing techniques [4][5][6][7] to generate a photo-realistic, smoothly-interpolated video sequence from a set of keyframes which represent typical facial positions for each phoneme in the text.…”

Section: Introductionmentioning

confidence: 99%

“…Provine and Bruton [3] and Waters and Levergood [8] specifically studied the various positions of major facial features (visemes or visual phonemes) during speech. Provine and Bruton's results suggest that a minimum of 20 different positions of mouth and jaw should be catered for in any facial motion synthesis system, while Waters and Levergood actually used 55 in their commercial system (DECface).…”

mentioning

confidence: 99%

“…The restoration is restricted to the area around the mouth, but even there, important features such as jaw and tongue position are not addressed. Moreover, interpolation is restricted to 12 different mouth positions which must cover the 40-50 phonemes [3,17,18] of various English dialects. As a result, the visual representation of ENCODING VIDEO NARRATION AS TEXTmany syllables (in particular, diphthongs) is incorrect to the extent that they would be misinterpreted by a lipreader.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Encoding Video Narration as Text

Welsh

Conway

2000

Real-Time Imaging

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Encoding Video Narration as Text

Welsh

Conway

2000

Real-Time Imaging

View full text Add to dashboard Cite

“…The resulting speech-synchronized animation is of extremely good quality if good motion capture techniques and equipment are combined with a high-quality facial model. The third type of method involves using 2D imageprocessing techniques, 13,28,29 achieving good results for speech synchronization. The character is filmed speaking a corpus that includes all the necessary phonemes or triphones (a combination of three phonemes).…”

mentioning

confidence: 99%

Animating song

King

Parent²

2004

Computer Animation & Virtual

View full text Add to dashboard Cite

show abstract

“…Speech recognition techniques [16] can also be used for automated segmentation. The sound track can be a speech waveform [7,22,29,30,31,33] or text [22,28,36]. If required, a waveform is then created from the phonemes.…”

Section: Previous Workmentioning

confidence: 99%