2004
DOI: 10.1145/1034780.1034784
|View full text |Cite
|
Sign up to set email alerts
|

A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents

Abstract: __________________________________________________________________________________________In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word-and syllable-level indexing fea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2006
2006
2012
2012

Publication Types

Select...
5
2
1

Relationship

4
4

Authors

Journals

citations
Cited by 27 publications
(36 citation statements)
references
References 27 publications
0
36
0
Order By: Relevance
“…We use the Mandarin Chinese collection of the TDT corpora for the retrospective retrieval task [9], such that the statistics for the entire document collection is obtainable. The Chinese news stories (text) from Xinhua News Agency are used as our test queries (or training query exemplars) and training corpus for all topic models (excluding test query set and training query exemplars).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We use the Mandarin Chinese collection of the TDT corpora for the retrospective retrieval task [9], such that the statistics for the entire document collection is obtainable. The Chinese news stories (text) from Xinhua News Agency are used as our test queries (or training query exemplars) and training corpus for all topic models (excluding test query set and training query exemplars).…”
Section: Methodsmentioning
confidence: 99%
“…On the other hand, subword-level indexing features behave more robustly against the Chinese word tokenization ambiguity, homophone ambiguity, open vocabulary problem, and speech recognition errors; hence, subword-based retrieval enhances recall. Accordingly, there is good reason to fuse the information obtained from indexing the features of different levels [9].…”
Section: Subword-level Index Unitsmentioning
confidence: 99%
See 1 more Smart Citation
“…This method was further improved upon in Song and Croft [1999]. Chen et al [2004] applied Song and Croft's method to Mandarin SDR using 1-best ASR transcripts. In this task, it was also shown to outperform tf · idf (with logarithmically adjusted document and query term frequencies).…”
Section: Retrieval Via Statistical Language Modelingmentioning
confidence: 99%
“…For example, the n-gram modeling (especially the bigram and trigram modeling) approach, which determines the probability of a word given the preceding n-1 word history, is most prominently used [Jelinek and Mercer 1980;Rosenfeld 2000;Bellegarda 2004]. This statistical paradigm was first introduced for the information retrieval (IR) problems by Ponte and Croft [1998], Song and Croft [1999], and Miller et al [1999], indicating very good potential, and was then extended in a number of publications [Berger and Lafferty 1999;Hoffmann 1999;Lafferty and Zhai 2001;Chen et al 2004b]. In these approaches, the relevance measure between a query Q and a document D is expressed as P (D |Q ); that is, the probability that D is relevant given that the query Q is posed.…”
Section: Introductionmentioning
confidence: 99%