2008
DOI: 10.1007/978-3-540-85920-8_74
|View full text |Cite
|
Sign up to set email alerts
|

Lip-Reading Technique Using Spatio-Temporal Templates and Support Vector Machines

Abstract: Abstract. This paper presents a lip-reading technique to identify the unspoken phones using support vector machines. The proposed system is based on temporal integration of the video data to generate spatiotemporal templates (STT). 64 Zernike moments (ZM) are extracted from each STT. This work proposes a novel feature selection algorithm to reduce the dimensionality of the 64 ZM to 12 features. The proposed technique uses the shape of probability curve as a goodness measure for optimal feature selection. The f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…Since the ASR system needs to treat various lengths of temporal features, dynamic time warping methods [13] or hidden Markov models (HMM) [14,15] have been widely used to handle the temporal data. Other researchers have used artificial neural networks (ANN) [16,17] and support vector machine (SVM) [18] that are renowned for its excellent generalization performance. However, these methods are not ideal for lip reading applications because they require a huge amount of training data, and would require extensive retraining any time a new word class is added to the database.…”
Section: Introductionmentioning
confidence: 99%
“…Since the ASR system needs to treat various lengths of temporal features, dynamic time warping methods [13] or hidden Markov models (HMM) [14,15] have been widely used to handle the temporal data. Other researchers have used artificial neural networks (ANN) [16,17] and support vector machine (SVM) [18] that are renowned for its excellent generalization performance. However, these methods are not ideal for lip reading applications because they require a huge amount of training data, and would require extensive retraining any time a new word class is added to the database.…”
Section: Introductionmentioning
confidence: 99%
“…Various variants of HMMs have also been used for audio-visual ASR, such as HMMs with non-Gaussian continuous observation probabilities [39]. Moreover, additional methods to overcome the difference in the speed of speaking for classification have been employed in audio-visual ASR systems, such as dynamic time warping (DTW), used by Petajan [4] are computationally expensive and inaccurate, while other classifiers that allow the difference among speakers to be considered for classifying the visual data have used artificial neural networks (ANN) [40,41], hybrid ANN-DTW systems [42], hybrid ANN-HMM [43] and recently the support vector machines (SVM) [44]. SVM is based on the structural risk minimization principle in contrast to empirical risk minimization on which many classifiers are based.…”
Section: Tvc751_sourcementioning
confidence: 99%