Stanley F. Chen scite author profile

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.

show abstract

An empirical study of smoothing techniques for language modeling

Chen

Goodman

1996

648

477

View full text Add to dashboard Cite

show abstract

A Gaussian Prior for Smoothing Maximum Entropy Models

Chen¹,

Rosenfeld²

1999

229

140

View full text Add to dashboard Cite

In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in maximum entropy smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing n-gram language models. Because of the mature body of research in n-gram model smoothing and the close connection between maximum entropy and conventional n-gram models, this domain is well-suited to gauge the performance of maximum entropy smoothing methods. Over a large number of data sets, we find that an ME smoothing method proposed to us by Lafferty [1] performs as well as or better than all other algorithms under consideration. This general and efficient method involves using a Gaussian prior on the parameters of the model and selecting maximum a posteriori instead of maximum likelihood parameter values. We contrast this method with previous n-gram smoothing methods to explain its superior performance.

show abstract

Aligning sentences in bilingual corpora using lexical information

Chen

1993

136

View full text Add to dashboard Cite

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown el al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.

show abstract

Whole-sentence exponential language models: a vehicle for linguistic-statistical integration

Rosenfeld

Chen

Zhu

2001

Computer Speech & Language

View full text Add to dashboard Cite

We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a "bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional exponential (e.g. maximum entropy) models proposed to date. Using the model is straightforward. Training the model requires sampling from an exponential distribution. We describe the challenge of applying Monte Carlo Markov Chain and other sampling techniques to natural language, and discuss smoothing and step-size selection. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analysing competitive models in the Switchboard and Broadcast News domains, incorporating lexical and syntactic information.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stanley F. Chen

An empirical study of smoothing techniques for language modeling

An empirical study of smoothing techniques for language modeling

A Gaussian Prior for Smoothing Maximum Entropy Models

Aligning sentences in bilingual corpora using lexical information

Whole-sentence exponential language models: a vehicle for linguistic-statistical integration

Contact Info

Product

Resources

About