Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08 2008
DOI: 10.3115/1599081.1599085
|View full text |Cite
|
Sign up to set email alerts
|

An improved hierarchical Bayesian model of language for document classification

Abstract: This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model's capacity to outperform both the simple multinomial and more recently proposed extensions on the classification task. We also compare the classifiers to a linear SVM, and show that provided certain conditions are met,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
17
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(20 citation statements)
references
References 14 publications
3
17
0
Order By: Relevance
“…For all four datasets, we use 10-fold cross-validation to make maximal use of the data and to allow comparison with the previous work by Allison [9]. Ten obtained values of performance are averaged to give the final result.…”
Section: Isrn Artificial Intelligencementioning
confidence: 99%
See 4 more Smart Citations
“…For all four datasets, we use 10-fold cross-validation to make maximal use of the data and to allow comparison with the previous work by Allison [9]. Ten obtained values of performance are averaged to give the final result.…”
Section: Isrn Artificial Intelligencementioning
confidence: 99%
“…The beta-binomial distribution model is derived with consideration of a serious drawback in Dirichlet-multinomial modeling [9]. If we use the Dirichlet distribution, (6), to describe the probability density of , then it is concluded that words having the same expectation value in also have the same variance in ; that is,…”
Section: Beta-binomial Modelmentioning
confidence: 99%
See 3 more Smart Citations