Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
DOI: 10.1109/icslp.1996.607085
|View full text |Cite
|
Sign up to set email alerts
|

Modeling long distance dependence in language: topic mixtures vs. dynamic cache models

Abstract: In this paper, we investigate a new statistical language model which captures topic-related dependencies of words within and across sentences. First, we develop a sentence-level mixture language model that takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic cache adaptation techniques in the framework of the mixture model. Experiments with the static (or unadapted) mixture model on the 1994 WSJ task indicated a 21% reduction in perplexity and a 3-4% i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 47 publications
(47 citation statements)
references
References 21 publications
0
47
0
Order By: Relevance
“…Individual data sources will be more appropriate depending on the task, for example, broadcast news or conversational telephone speech. To reduce the mismatch between the interpolated model and the target domain of interest, interpolation weights may be tuned by minimizing the perplexity on some held-out data similar to the target domain (Jelinek and Mercer, 1980;Kneser and Steinbiss, 1993;Iyer et al, 1994;Bahl et al, 1995;Rosenfeld, 1996Rosenfeld, , 2000Jelinek, 1997;Clarkson and Robinson, 1997;Kneser and Peters, 1997;Seymore and Rosenfeld, 1997;Iyer and Ostendorf, 1999). These weights indicate the "usefulness" of each source for a particular task.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Individual data sources will be more appropriate depending on the task, for example, broadcast news or conversational telephone speech. To reduce the mismatch between the interpolated model and the target domain of interest, interpolation weights may be tuned by minimizing the perplexity on some held-out data similar to the target domain (Jelinek and Mercer, 1980;Kneser and Steinbiss, 1993;Iyer et al, 1994;Bahl et al, 1995;Rosenfeld, 1996Rosenfeld, , 2000Jelinek, 1997;Clarkson and Robinson, 1997;Kneser and Peters, 1997;Seymore and Rosenfeld, 1997;Iyer and Ostendorf, 1999). These weights indicate the "usefulness" of each source for a particular task.…”
Section: Introductionmentioning
confidence: 99%
“…To further improve robustness to varying styles or tasks, unsupervised test-set adaptation, for example, to a particular broadcast show, may be used (Della Pietra et al, 1992;Bulyko et al, 2012Bulyko et al, , 2007Federico, 1999Federico, , 2003Gildea and Hofmann, 1999;Chen et al, 2001;Mrva andWoodland, 2004, 2006;Chien et al, 2005;Tam and Schultz, 2005;Liu et al, 2007Liu et al, , 2008Liu et al, , 2009Liu et al, , 2010. As directly adapting n-gram word probabilities is impractical on limited amounts of data, standard adaptation schemes only involve updating one single, context independent interpolation weight for the component models (Iyer et al, 1994;Rosenfeld, 1996;Clarkson and Robinson, 1997;Seymore and Rosenfeld, 1997;Iyer and Ostendorf, 1999;Mrva and Woodland, 2006). …”
Section: Introductionmentioning
confidence: 99%
“…There are several references showing effectiveness of monolingual topic-dependent language models [cf. e.g., Iyer and Ostendorf 1999], and our approach may be regarded as similar to the monolingual topic-dependent language model. This motivates us to construct topic-dependent LMs and contrast their performance with our models.…”
Section: Topic-dependent Language Modelsmentioning
confidence: 99%
“…We can place this type of modeling within our adaptation framework by viewing the first-pass hypothesis transcription of an article to be another topic adaptation text. We can adapt our 3 This procedure is a crude but quick approximation to maximum entropy training with this feature set. It would be more sound (but vastly more expensive) to set the parameters £ using a true maximum entropy training algorithm.…”
Section: ¢ -Gram Probabilitiesmentioning
confidence: 99%
“…1 Numerous efforts have demonstrated large improvements in the measure of perplexity [2,4,9]; however, perplexity has been shown to correlate poorly with speech recognition performance. Several papers have reported modest speech recognition word-error rate (WER) improvements of about 0.5% absolute: Sekine and Grishman [14] add ad hoc topic and cache scores to their language model score in log probability space, and Iyer and Ostendorf [3] This work was supported by the National Security Agency under grants MDA904-96-1-0113 and MDA904-97-1-0006. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. government.…”
Section: Introductionmentioning
confidence: 99%