2006 Fourth Latin American Web Congress 2006
DOI: 10.1109/la-web.2006.11
|View full text |Cite
|
Sign up to set email alerts
|

Contextual Entropy and Text Categorization

Abstract: In this paper we describe a new approach to text categorization, our focus is in the amount of information (the entropy) in the text. The entropy is computed with the empirical distribution of words in the text. We provide the system with a manually segmented collection of documents in different categories. For each category a separate empirical distribution of words is computed, we will use this empirical distributions for categorization purposes.If we compute the entropy of the test document for each empiric… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2008
2008
2016
2016

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…Table 2 and figure 4 present the averaged accurate of this experiment. In figure 4 we can see the behavior of CE classifier to different values of n, that as in [4] the best results are obtaining on the first values of n.…”
Section: Methodsmentioning
confidence: 89%
See 3 more Smart Citations
“…Table 2 and figure 4 present the averaged accurate of this experiment. In figure 4 we can see the behavior of CE classifier to different values of n, that as in [4] the best results are obtaining on the first values of n.…”
Section: Methodsmentioning
confidence: 89%
“…In previous work [4] we have shown that entropy is effective in text categorization for formal as well as for intuitive reasons. The entropy is a statistic that depends both on the object itself (the text) and the context (the vocabulary distribution).…”
Section: Contextual Entropymentioning
confidence: 99%
See 2 more Smart Citations
“…To calculate the entropy in (4), one needs to know the data distribution first. According to the maximum entropy criterion, if the data distribution is unknown, the possible choice is the uniform distribution [46]. This assumption may cause the estimation of the cluster number to deviate from the true number of clusters since the true data distribution is not uniform distribution.…”
Section: Entropy-based K-means Algorithm (Automatically Deciding Omentioning
confidence: 99%