__________________________________________________________________________________________In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word-and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word-and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.
INTRODUCTIONOver the past three decades, statistical modeling approaches for speech and language processing have been studied extensively. Among the approaches, hidden Markov modeling (HMM) for speech recognition is undoubtedly the most prevalent and effective [Jelinek 1997]. In this approach, a set of statistical phoneme-or word-level HMMs was trained beforehand with a labeled speech corpus; the probability of the test speech utterance with respect to the HMMs was then evaluated on the HMM network to find the optimal phoneme or word sequence with the maximum likelihood. This statistical paradigm was first introduced for the information retrieval problem by BBN Technologies [Miller et al., 1999] and by Ponte and Croft [1998] and Song and Croft [1999], indicating very good potential, and was then extended in a number of