IEEE International Conference on Acoustics Speech and Signal Processing 1993
DOI: 10.1109/icassp.1993.319179
|View full text |Cite
|
Sign up to set email alerts
|

Improving connected letter recognition by lipreading

Abstract: In this paper we show how recognition performance in automated speech perception can be significantly improved by additional Lipreading, so called "Speechreading". We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. The acoustic and visual speech data is preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the Dynamic Time Warping algorithm. This is shown on a connected word recognition problem, the notorious… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
69
0
1

Year Published

1997
1997
2017
2017

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 93 publications
(70 citation statements)
references
References 9 publications
0
69
0
1
Order By: Relevance
“…As the application domain is the same, lip reading classification techniques are often the same as those applied in the audio speech recognition (ASR) field and, consequently, dynamic time warping (DTW) [39,40] and HMMs [10,18,41], are popular. Moreover, by using a method common to both the audio and visual aspects of speech, there is the potential for a more straightforward combination of results obtained from separate audio and visual investigations and such integration has often been carried out using machine learning techniques, such as time delay neural network (TDNN) [42], support vector machines (SVM) [43] and AdaBoost [44].…”
Section: Speech Classification Based On Lip Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…As the application domain is the same, lip reading classification techniques are often the same as those applied in the audio speech recognition (ASR) field and, consequently, dynamic time warping (DTW) [39,40] and HMMs [10,18,41], are popular. Moreover, by using a method common to both the audio and visual aspects of speech, there is the potential for a more straightforward combination of results obtained from separate audio and visual investigations and such integration has often been carried out using machine learning techniques, such as time delay neural network (TDNN) [42], support vector machines (SVM) [43] and AdaBoost [44].…”
Section: Speech Classification Based On Lip Featuresmentioning
confidence: 99%
“…DTW utilizes dynamic programming to generate candidate stretched and compressed sections in sequences of feature vectors, in order to find an alignment between two time-series that minimizes distortion [57] and in doing so produces a suitable warping function that minimizes the total distance (normally Euclidean) between an unknown sample and the reference template; While a DTW-based lip reading system has been proposed previously [39,40,58], to the best of the authors' knowledge the problem has not been addressed using multi-dimensional DTW and reference template probabilities.…”
Section: Template Probability Multi Dimensional Dynamic Time Warpingmentioning
confidence: 99%
“…In the previous study [5] In this paper [6], the system proposed by Ralph Krike et al operates under the influence of active near infrared illumination. It makes use of a technique which makes use of local binary patterns, to model lip motions with hidden Markov models.…”
Section: Figure 2: Appearance Of Lip During Various Soundmentioning
confidence: 99%
“…The solution we propose is based on this approach, but contrarily to published works, e.g. [1,4], our systems deal with abovementioned phenomena of temporal shift between audio and visual sources.…”
Section: Integration Of Visual Informationmentioning
confidence: 99%