2017
DOI: 10.1007/978-3-319-72926-8_9
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study on Term Weighting Schemes for Text Classification

Abstract: Text Classification (or Text Categorization) is a popular machine learning task. It consists in assigning categories to documents. In this paper, we are interested in comparing state of the art classifiers and state of the art feature weights. Feature weight methods are classic tools that are used in text categorization. We extend previous studies by evaluating numerous term weighting schemes for state of the art classification methods. We aim at providing a complete survey on text classification for fair benc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…Another study done by Lan et al in [6] confirmed the superiority of tf.idf over tf.chi. A recent and fair comparaison beween state of the art TWS [7] have shown similar results as shown in [3]. However, In [4], Deng et al concluded unlike Debole the superiority tf.chi over tf.idf .…”
Section: Introductionmentioning
confidence: 66%
“…Another study done by Lan et al in [6] confirmed the superiority of tf.idf over tf.chi. A recent and fair comparaison beween state of the art TWS [7] have shown similar results as shown in [3]. However, In [4], Deng et al concluded unlike Debole the superiority tf.chi over tf.idf .…”
Section: Introductionmentioning
confidence: 66%
“…Since the unsupervised term weighting methods are used extensively in exam question classification, therefore only the result by using unsupervised term weighting methods will be analyzed. In [19], the authors conduct their comparative study by using different unsupervised term weighting methods. The unsupervised term weighting methods used in this study are TF and TF-IDF.…”
Section: Related Work In Text Classificationmentioning
confidence: 99%
“…Unsupervised TWSs are generally borrowed from Information Retrieval domain [26] and adopted for TC [22,7,23].…”
Section: Term Weighting Schemesmentioning
confidence: 99%
“…Besides the raw count (f t,d ) representation of tf , there exist numerous other variants such as binary representation (w i,j = 1 if the term t i occurs in the document d j and 0 otherwise), log(f i,j ) + 1, f i,j / t ∈d f t ,d . All these variants are also used as TWS on their own [26,22,7,8]. The inverse document frequency has also a number of variants such as log(N/ [26].…”
Section: Term Weighting Schemesmentioning
confidence: 99%
See 1 more Smart Citation