“…The study reported in (Zhang and Rudnicky, 2001) included acoustic features, language model features, word lattice features, N-best features, and parser-based features derived from the language model features and the grammar (parsing-mode and slot-backoff-mode) as input features for three different post-classifiers (DT, neural network and support vector machine (SVM)) in an LVCSR system. Recent work (Goldwater et al, 2009) has proposed disfluency-based features, speaker sex, broad class-based features, turn boundary-based features, language model-based features, pronunciation-based features (word length, number of pronunciations, number of homophones, number of neighbors, and frequency-weighted homophones/neighbors), prosodic features (pitch, intensity, speech rate, duration and log jitter) and concluded that extreme prosodic values, words following a speaker turn and preceding disfluent interruption contribute most to a high word error rate (WER). To the best of our knowledge, there has not been such a systematic analysis of relevant features for STD.…”