Advances in Nonlinear Speech Processing
DOI: 10.1007/978-3-540-77347-4_2
|View full text |Cite
|
Sign up to set email alerts
|

Some Experiments in Audio-Visual Speech Processing

Abstract: Natural speech is produced by the vocal organs of a particular talker. The acoustic features of the speech signal must therefore be correlated with the movements of the articulators (lips, jaw, tongue, velum,...). For instance, hearing impaired people (and not only them) improve their understanding of speech by lip reading. This chapter is an overview of audiovisual speech processing with emphasis on some experiments concerning recognition, speaker verification, indexing and corpus based synthesis from tongue … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 45 publications
0
6
0
Order By: Relevance
“…DCT-mod2 [119] coefficients are another face image representations computed on normalized faces for robust audio-visual biometric systems against forgery attacks [71]. Another approach used DCT coefficients extracted from mouth region with Least Residual Error Energy (LREE) algorithm [34] and MFCCs as audio features [147].…”
Section: ) Mel-frequency Cepstral Coefficientsmentioning
confidence: 99%
“…DCT-mod2 [119] coefficients are another face image representations computed on normalized faces for robust audio-visual biometric systems against forgery attacks [71]. Another approach used DCT coefficients extracted from mouth region with Least Residual Error Energy (LREE) algorithm [34] and MFCCs as audio features [147].…”
Section: ) Mel-frequency Cepstral Coefficientsmentioning
confidence: 99%
“…Techniques include text-independent VTLN (Sundermann et al, 2003), maximum likelihood adaptation and statistical techniques (Karam et al, 2009;Mouchtaris et al, 2004;Stylianou & Cappe, 1998), unit selection (Sundermann et al, 2006), and client memory indexation. (Chollet et al, 2007;Constantinescu et al, 1999;Perrot et al, 2005). The analysis part of a voice conversion algorithm focuses on the extraction of the speaker's identity.…”
Section: Voice Transformationmentioning
confidence: 99%
“…Speaker transformation techniques [28,39,85,40,2,7,77,62,16] might involve modifications of different aspects of the speech signal that carries the speaker's identity. We can cite different methods.…”
Section: Speech Processingmentioning
confidence: 99%
“…In text-independent voice conversion techniques, the system is trained with source and target speakers uttering different texts. Text-independent techniques include VTLN [71], maximum likelihood adaptation and statistical techniques, unit selection, and client memory indexing [62,16]. The analysis part of a voice conversion algorithm focuses on the extraction of the speaker's identity.…”
Section: Speech Processingmentioning
confidence: 99%