2000
DOI: 10.1007/3-540-45268-0_6
|View full text |Cite
|
Sign up to set email alerts
|

Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
92
0
4

Year Published

2005
2005
2014
2014

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 165 publications
(99 citation statements)
references
References 6 publications
1
92
0
4
Order By: Relevance
“…Numerous feature selection/reduction approaches have been proposed [6] in order to solve this problem. The successfully used feature selection approaches include Document Frequency (DF), Mutual Information (MI), Information Gain (IG), Chi-square test or Gallavotti, Sebastiani & Simi metric [7,8]. Furthermore, a better document representation may lead to decreasing the feature vector dimension, e.g.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Numerous feature selection/reduction approaches have been proposed [6] in order to solve this problem. The successfully used feature selection approaches include Document Frequency (DF), Mutual Information (MI), Information Gain (IG), Chi-square test or Gallavotti, Sebastiani & Simi metric [7,8]. Furthermore, a better document representation may lead to decreasing the feature vector dimension, e.g.…”
Section: Related Workmentioning
confidence: 99%
“…6 Olsson presents in [18] a Czech-English cross-language classification on the MALACH 7 data set. Wu et al deals in [19] with a bilingual topic aspect classification of English and Chinese news articles from the Topic Detection and Tracking (TDT) 8 collection. Unfortunatelly, only few work about the classification of the Czech documents exits.…”
Section: Related Workmentioning
confidence: 99%
“…Among these are the DIA association factor (Fuhr and Buckley 1991), chi-square (Yang and Pedersen 1997;Sebastiani, Sperduti et al 2000;Caropreso, Matwin et al 2001), NGL coefficient (Ng, Goh et al 1997;Ruiz and Srinivasan 1999), information gain Lewis and Ringuette 1994;Moulinier, Raskinis et al 1996;Yang and Pedersen 1997;Larkey 1998;Mladenic and Grobelnik 1998;Caropreso, Matwin et al 2001), mutual information (Larkey and Croft 1996;Wai and Fan 1997;Dumais, Platt et al 1998;Taira and Haruno 1999) odds ratio (Mladenic and Grobelnik 1998;Ruiz and Srinivasan 1999;Caropreso, Matwin et al 2001), relevancy score (Wiener, Pedersen et al 1995) and GSS coefficient (Galavotti, Sebastiani et al 2000). Three of the most popular methods are descrivbed briefly below.…”
Section: This Leads To the Term Frequency/inverse Document Frequency mentioning
confidence: 99%
“…Various experimental comparisons of feature selection functions applied to TC contexts have been carried out (Yang and Pedersen 1997;Mladenic and Grobelnik 1998;Galavotti, Sebastiani et al 2000;Caropreso, Matwin et al 2001). In these experiments most functions have improved on the results for basic document frequency thresholding.…”
Section: Experimental Comparisonsmentioning
confidence: 99%
“…• GSSC: The GSS (Galavotti⋅Sebastiani⋅Simi) Coefficient defined in [13] represents the core calculation as well as a simplified variant of both the Chisquare Statistics (χ 2 ) and the Correlation Coefficient (CC) statistical FS mechanisms. In [27,30], the authors state: (i) the well-established χ 2 statistic can be applied to measure the lack of independence between a term u h and a predefined class C i ; and (ii) if the feature/term and the class are independent, the calculated χ 2 score has a natural value 0.…”
Section: Statistical Feature Selectionmentioning
confidence: 99%