In large-vocabulary continuous speech recognition, subword units must be used for practical reasons. Context-dependent phone models have become a very successful class of subword units. These phone-sized models take into account the neighboring phonetic contexts, which strongly affect the realization of a phone. However, previous approaches have only considered intraword coarticulation, and have ignored interword coarticulation, which is very important in continuous speech, especially for short function words like “the” and “a.” This study extends triphone-based modeling to interword coarticulation modeling. A simple extension of triphones is problematic due to the sharply growing number of triphones. In order to contain this growth, a maximum-likelihood clustering procedure was introduced to reduce 7057 intraword and interword triphones to 1000 generalized triphones. Interword generalized triphones were incorporated into a large-vocabulary, speaker-independent, continuous speech recognizer, SPHINX [K. F. Lee and H. W. Hon, Large Vocabulary Speaker-lndependent Continuous Speech Recognition (ICASSP, 1988)]. This improvement reduced the number of errors by as much as 44% on the 1000-word DARPA resource management task. This demonstrates the importance of interword coarticulation modeling, and the effectiveness of the methods used.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.