1995 International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.1995.479285
|View full text |Cite
|
Sign up to set email alerts
|

Toward movement-invariant automatic lip-reading and speech recognition

Abstract: We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speaker's face by the lip locator module. Finally, the speaker's face is automatically acquired and followed by the face tracker sub-system… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(24 citation statements)
references
References 8 publications
0
24
0
Order By: Relevance
“…To overcome this problem, in many methods, large training sets are used for training [9], several parameters (sometimes very sensitive to initialization) need to be tuned [3], or time consuming preprocessing steps must be taken. In our system, we avoid such pre-processing tasks, thus making it more general and independent of the database set.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…To overcome this problem, in many methods, large training sets are used for training [9], several parameters (sometimes very sensitive to initialization) need to be tuned [3], or time consuming preprocessing steps must be taken. In our system, we avoid such pre-processing tasks, thus making it more general and independent of the database set.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Although the color composition of human skin and lips differs surprisingly little across individuals [7,8], total intensity of the reflection varies over a wide range [9]. Color values also depend strongly on the camera, frame grabber and illumination.…”
Section: Lip Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…The feature definition is based on the notion of eigenfaces or eigenlips which represent the eigenvectors of the training sets. An alternative to PCA, very common as well, is Discrete Cosine Transform (DCT) such as in (Duchnowski et al, 1995);(Prez et al, 2005); (Hong et al, 2006); (Lucey & Potamianos, 2006). Linear Discriminant Analysis (LDA), Maximum Likelihood Data Rotation (MLLT), Discrete Wavelet Transform, Discrete Walsh Transform (Potamianos et al, 1998) are other methods that fit in this class and were used for lip reading.…”
Section: Feature Vectors Definitionmentioning
confidence: 99%
“…In some researches, lipreading combined with face and voice is studied to help biometric identification [1][2][3]. There is also much work focusing on audio-visual speech recognition (AVSR) [4][5][6][7][8][9][10][11][12][13][14][15][16], trying to find effective ways of combining visual information with existing audio-only speech recognition systems (ASR). McGurk effect [17] demonstrates that inconsistency between audio and visual information can result in perceptual confusion.…”
Section: Introductionmentioning
confidence: 99%