Toward movement-invariant automatic lip-reading and speech recognition

Duchnowski, Paul; Hunke, M.; Busching, D.; Meier, Uwe; Waibel, Alexander

doi:10.1109/icassp.1995.479285

Cited by 54 publications

(24 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To overcome this problem, in many methods, large training sets are used for training [9], several parameters (sometimes very sensitive to initialization) need to be tuned [3], or time consuming preprocessing steps must be taken. In our system, we avoid such pre-processing tasks, thus making it more general and independent of the database set.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…Although the color composition of human skin and lips differs surprisingly little across individuals [7,8], total intensity of the reflection varies over a wide range [9]. Color values also depend strongly on the camera, frame grabber and illumination.…”

Section: Lip Segmentationmentioning

confidence: 99%

“…Moreover, the tuning of parameters is usually very difficult to achieve and many of them require manual selection and initialization. Color provides additional information which proves to be very useful for the task of lip detection and has been used widely [1,4,5,6,7,8,9].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An unconstrained method for lip detection in color images

Skodras

Fakotakis

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The use of visual information derived from accurate lip extraction, can provide features invariant to noise perturbation for speech recognition systems and can be also used in a wide variety of applications. Unlike many current automatic lip reading systems which impose several restrictions on users, our efforts are towards an unconstrained system. In this paper we introduce a method using k-means color clustering with automatically adapted number of clusters, for the extraction of the lip area. The method's performance is improved by previously applying nearest neighbor color segmentation. The extracted lip area is morphologically processed and fitted by a best-fit ellipse. The points of interest (keypoints) of the mouth area are extracted, while a corner detector for fine tuning of mouth corners is applied. Experimental tests have shown that the algorithm works very well under natural conditions and accurate extraction of lip keypoints is feasible.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Lip Segmentationmentioning

confidence: 99%

See 1 more Smart Citation

An unconstrained method for lip detection in color images

Skodras

Fakotakis

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The feature definition is based on the notion of eigenfaces or eigenlips which represent the eigenvectors of the training sets. An alternative to PCA, very common as well, is Discrete Cosine Transform (DCT) such as in (Duchnowski et al, 1995);(Prez et al, 2005); (Hong et al, 2006); (Lucey & Potamianos, 2006). Linear Discriminant Analysis (LDA), Maximum Likelihood Data Rotation (MLLT), Discrete Wavelet Transform, Discrete Walsh Transform (Potamianos et al, 1998) are other methods that fit in this class and were used for lip reading.…”

Section: Feature Vectors Definitionmentioning

confidence: 99%

Automatic Visual Speech Recognition

Chitu¹,

Rothkrantz²

2012

Speech Enhancement, Modeling and Recognition- Algorithms and Applications

View full text Add to dashboard Cite

“…In some researches, lipreading combined with face and voice is studied to help biometric identification [1][2][3]. There is also much work focusing on audio-visual speech recognition (AVSR) [4][5][6][7][8][9][10][11][12][13][14][15][16], trying to find effective ways of combining visual information with existing audio-only speech recognition systems (ASR). McGurk effect [17] demonstrates that inconsistency between audio and visual information can result in perceptual confusion.…”

Section: Introductionmentioning

confidence: 99%

Local spatiotemporal descriptors for visual recognition of spoken phrases

Zhao

Pietikäinen

Hadid

2007

Proceedings of the International Workshop on Human-Centered Multimedia

View full text Add to dashboard Cite

Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment. In this paper, we propose local spatiotemporal descriptors to represent and recognize spoken isolated phrases based solely on visual input. Positions of the eyes determined by a robust face and eye detector are used for localizing the mouth regions in face images. Spatiotemporal local binary patterns extracted from these regions are used for describing phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on the Tulips1 audio-visual database, the accuracy 92.7% of our method clearly outperforms the others. Advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.

show abstract

Toward movement-invariant automatic lip-reading and speech recognition

Cited by 54 publications

References 8 publications

An unconstrained method for lip detection in color images

An unconstrained method for lip detection in color images

Automatic Visual Speech Recognition

Local spatiotemporal descriptors for visual recognition of spoken phrases

Contact Info

Product

Resources

About