Hidden Conditional Random Fields for Visual Speech Recognition

Pass, Adrian; Zhang, Jianguo; Stewart, Darryl

doi:10.1109/imvip.2009.28

Cited by 1 publication

(2 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is demonstrated through phoneme to viseme mapping; for example the phonemes /g/, /N/, /k/ all appear to share the same corresponding viseme. Using a window based HCRF in a speaker dependent isolated digit recognition task in [5] we demonstrated that visual speech recognition performance can be improved by adopting a contextual approach to visual speech recognition. Due to excessive training times however, this technique was found to be impractical for a larger speaker independent task.…”

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Inter-frame contextual modelling for visual speech recognition

Pass

Hanna

et al. 2010

2010 IEEE International Conference on Image Processing

Self Cite

View full text Add to dashboard Cite

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining InterFrame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system.

show abstract

Section: Introductionmentioning

confidence: 98%

“…It was shown in [13] that for audio speech recognition using phonemes, this technique outperforms the standard HMM using dynamic features. We carry this and the work in [5] forward by applying the system to the task of isolated digit, visual speech recognition, and evaluate the performance against the standard feature based approaches.…”

Section: Introductionmentioning

confidence: 99%