“…Various variants of HMMs have also been used for audio-visual ASR, such as HMMs with non-Gaussian continuous observation probabilities [39]. Moreover, additional methods to overcome the difference in the speed of speaking for classification have been employed in audio-visual ASR systems, such as dynamic time warping (DTW), used by Petajan [4] are computationally expensive and inaccurate, while other classifiers that allow the difference among speakers to be considered for classifying the visual data have used artificial neural networks (ANN) [40,41], hybrid ANN-DTW systems [42], hybrid ANN-HMM [43] and recently the support vector machines (SVM) [44]. SVM is based on the structural risk minimization principle in contrast to empirical risk minimization on which many classifiers are based.…”