2004
DOI: 10.1016/j.ins.2003.07.007
|View full text |Cite
|
Sign up to set email alerts
|

Effect of term distributions on centroid-based text categorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
50
0
7

Year Published

2004
2004
2017
2017

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(58 citation statements)
references
References 29 publications
1
50
0
7
Order By: Relevance
“…We conclude by highlighting another difference with respect to related papers by Joachims (1997), Han and Karypis (2000) and Lertnattee and Theeramunkong (2004), where all features are used in their experiments. 8 In this work, features are preliminarily filtered and only those deemed most discriminant actually contribute to the classification.…”
mentioning
confidence: 78%
See 1 more Smart Citation
“…We conclude by highlighting another difference with respect to related papers by Joachims (1997), Han and Karypis (2000) and Lertnattee and Theeramunkong (2004), where all features are used in their experiments. 8 In this work, features are preliminarily filtered and only those deemed most discriminant actually contribute to the classification.…”
mentioning
confidence: 78%
“…The control parameters β and γ define the relative impact of positive and negative examples in the definition of the class prototype. Dumais, Platt, Heckerman andSahami (1998), Joachims (1997), Han and Karypis (2000), Lertnattee and Theeramunkong (2004) set β to 1 and γ to 0, so that the prototype of a class coincides with the centroid of its positive training examples. In this work we follow this mainstream and compute the classification score as the cosine similarity between the document vector and the centroid of a class.…”
Section: Centroid-based Classifiermentioning
confidence: 99%
“…The part used in the test determines whether the classifier is assigning the document to the correct classes according to their characteristics (LERTNATTEE;THEERAMUNKONG, 2004;SHANKAR;KARYPIS, 2000). Thus, the document class is given by:…”
Section: Model Configurationsmentioning
confidence: 99%
“…Dari aspek teknis, terdapat kemungkinan jika pengambilan cluster lebih dari satu, dengan mempertimbangkan adanya kemunculan sub topic lain dari cluster hasil pengelompokan tweets yang dapat dijadikan nilai informatif tambahan dalam penentuan pembobotan kalimat penting penyusunan ringkasan. Penggunaan lebih dari satu cluster inilah dilakukan sebuah teknik pembobotan term distribution on centroid based [6]. Metode ini menerapkan konsep pembobotan intra-cluster, inter-cluster dan keseluruhan dokumen sehingga dapat meningkatkan bobot term yang diskriminatif, dengan kata lain metode ini mampu membentuk representasi bobot menjadi lebih baik atau memiliki sense terhadap cluster trending issue yang terbentuk.…”
Section: Pendahuluanunclassified
“…Metode TDCB menggunakan konsep distribusi term berdasarkan intra-class, inter-class dan keseluruhan koleksi dokumen untuk meningkatkan bobot term yang diskriminatif [6]. Setiap term mempunyai bobot sesuai dengan frekuensi dokumennya (informasi intraclass) dan faktor diskriminatif yang berbanding terbalik dengan jumlah kelas atau cluster yang berisi term tersebut (informasi inter-class).…”
Section: A Term Distribution On Centroid Basedunclassified