2004
DOI: 10.1023/b:inrt.0000011209.19643.e2
|View full text |Cite
|
Sign up to set email alerts
|

Augmenting Naive Bayes Classifiers with Statistical Language Models

Abstract: We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes * Most research was conducted while the authors were at the School of Computer Science at University of Waterloo, Canada. 1classifier which allows for a local Markov dependence among observations; a model we refer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
153
0
1

Year Published

2006
2006
2011
2011

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 211 publications
(156 citation statements)
references
References 43 publications
2
153
0
1
Order By: Relevance
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
“…An alternative way to automatically define the function word set is to extract the most frequent words in a corpus [24,29]. There are also attempts to use word n-grams to exploit contextual information [27,7]. However, this process considerably increases the dimensionality of the problem and has not produced encouraging results so far.…”
Section: Previous Workmentioning
confidence: 99%
“…Such powerful machine learning algorithms can effectively cope with high dimensional and sparse data. Another approach is to apply a generative model, like a naïve Bayes model [27]. Yet another approach is to estimate the similarity between two texts [4,17].…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Researchers [2,14,15] have previously used language models for document classification and such an approach was essentially Bayesian. We too adopt a Bayesian approach but, in common with most IR applications, apply models that are unigram in that they consider each term independently and do not take account of the preceding tokens.…”
Section: Document Generationmentioning
confidence: 99%