[Proceedings 1992] IJCNN International Joint Conference on Neural Networks
DOI: 10.1109/ijcnn.1992.226994
|View full text |Cite
|
Sign up to set email alerts
|

Neural network lipreading system for improved speech recognition

Abstract: We designed and trained a modified time-delay neural network (TDNN) to perform both automatic lipreading ("speech reading") in conjunction with acoustic speech recognition in order to improve recognition both in silent environments as well as in the presence of acoustic noise. The speech reader subsystem has a speaker-independent recognition accuracy of 51% (in the absence of acoustic information); the combined acoustic-visual system has a recognition accuracy of 9 1 %, all on a ten-utterance speakerindependen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
35
0
1

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(37 citation statements)
references
References 21 publications
1
35
0
1
Order By: Relevance
“…Moreover, by using a method common to both the audio and visual aspects of speech, there is the potential for a more straightforward combination of results obtained from separate audio and visual investigations and such integration has often been carried out using machine learning techniques, such as time delay neural network (TDNN) [42], support vector machines (SVM) [43] and AdaBoost [44].…”
Section: Speech Classification Based On Lip Featuresmentioning
confidence: 99%
“…Moreover, by using a method common to both the audio and visual aspects of speech, there is the potential for a more straightforward combination of results obtained from separate audio and visual investigations and such integration has often been carried out using machine learning techniques, such as time delay neural network (TDNN) [42], support vector machines (SVM) [43] and AdaBoost [44].…”
Section: Speech Classification Based On Lip Featuresmentioning
confidence: 99%
“…In fixed lexicon systems, conditional independence is usually assumed at the word level, with very good results (Stork et al, 1992;Bregler et al, 1993b;Adjondani & Benoit, 1995;Movellan, 1995). While our approach does not require the assumption of conditional independence, it greatly simplifies the computations.…”
Section: Competitive Models and Robustificationmentioning
confidence: 99%
“…Recent years have seen a dramatic flourishing of the engineering literature on AVSR (Yuhas et al, 1990;Wu et al, 1991;Stork et al, 1992;Bregler et al, 1993b;Cosi et al, 1994;Bregler et al, 1994;Wolff et al, 1994;Hennecke et al, 1994;de Sa, 1994;Movellan, 1995). Current interest on AVSR is in part due to the popularization of digital multimedia tools, its potential application to automatic speech recognition in noisy environments (e.g., car telephony, airplane cockpits, noisy offices), and its links to fundamental theoretical issues in engineering and in cognitive science (Movellan & Chadderdon, 1996).…”
Section: Audio Visual Speech Recognitionmentioning
confidence: 99%
“…For example, in [6] a time-delayed neural network (TDNN) is applied in an automatic lipreading system to fuse audio and visual data. In [11], another TDNN is applied to visual and audio data to detect when and where a person is speaking in a scene. A major drawback of these networks is the problem of catastrophic forgetting; i.e., learned associations from input data to output classes could be adversely influenced if the network trained online.…”
Section: Related Workmentioning
confidence: 99%