Acoustic backing-off was recently proposed as an operationalisa tion of missing feature theory for increased recognition robustness. Acoustic backing-off effectively removes the detrimental influence o f outlier values from the local decisions in the Viterbi algorithm without any kind of explicit outlier detection. In the context of con nected digit recognition over telephone lines, it is shown that with more than 30% of the static mel-frequency cepstral coefficients dis turbed, acoustic backing-off is capable of reducing the word er ror rate by one order of magnitude. Furthermore, our results indi cate that the effectiveness of acoustic backing-off is optimal when dispersion of distortions due to acoustic feature transformations is minimal.
Recent research on the TIMIT corpus suggests th at longer-length acoustic models are more appropriate for pronunciation variation modelling than the context-dependent phones th at conventional autom atic speech recognisers use. However, the impressive speech recognition results obtained with longer-length models on TIMIT rem ain to be reproduced on other corpora. To understand the conditions in which longer-length acoustic models result in considerable im provem ents in recognition performance, we carry out recognition experiments on both TIMIT and the Spoken D utch C orpus and analyse the differences between the two sets of results. We establish th at the details o f the procedure used for initialising the longer-length models have a substantial effect on the speech recognition results. W hen initialised appropriately, longer-length acoustic models that borrow their topology from a sequence of triphones cannot capture the pronunciation variation phenom ena th at hinder recognition perform ance the most.
In this paper we introduce a novel method for clustering speech gestures, represented as contin uous trajectories in acoustic parameter space. Trajectory Clustering allows us to avoid the conditional independence assumption that makes it difficult to account for the fact that successive measurements of an articulatory gesture are correlated. We apply the Trajectory Clustering method for developing multiple parallel HMMs for a continuous digits recognition task. We compare the performance obtained with data-driven clustering to the recognition performance obtained with conventional Head-Body-Tail models, which use knowledge-based criteria for building multiple-HMMs in order to obviate the trajectory folding problem. The results show that Trajectory Clustering is able to discover structure in the the training database that is different from the structure assumed by the knowledge-based approach. In addition, the dataderived structure gives rise to significantly better recognition performance, and results in a 10% word error rate reduction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.