2015
DOI: 10.5430/air.v4n2p143
|View full text |Cite
|
Sign up to set email alerts
|

A text feature selection method based on category-distribution divergence

Abstract: The purpose of this paper is to overcome the problem that traditional feature selection methods [such as document frequency(DF), Chi-square statistic(CHI), information gain(IG), mutual information(MI) and Odds ratio(OR)] do not consider the distribution of features among different categories. The work aims at selecting the features that can accurately represent the theme of texts and to improve the accuracy of classification. In this paper, we propose a text feature selection method based on Category-Distribut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Consequently, text categorization techniques based on artificial intelligence are being used more and more in a variety of applications (Perikos, & Hatzilygeroudis, 2016). Out of these methods, Multi-label K-Nearest Neighbor (ML-KNN) (Zhu et al, 2020), Support vector machine (SVM) (Harish, & Revanasiddappa, 2017;Lu et al, 2015a), Random forest (Elyan & Gaber, 2017), and DT (Labani et al, 2018;Lu et al, 2015b) are several popular techniques widely employed for TC. It has always been difficult to deal with numerous features in the field of TC.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Consequently, text categorization techniques based on artificial intelligence are being used more and more in a variety of applications (Perikos, & Hatzilygeroudis, 2016). Out of these methods, Multi-label K-Nearest Neighbor (ML-KNN) (Zhu et al, 2020), Support vector machine (SVM) (Harish, & Revanasiddappa, 2017;Lu et al, 2015a), Random forest (Elyan & Gaber, 2017), and DT (Labani et al, 2018;Lu et al, 2015b) are several popular techniques widely employed for TC. It has always been difficult to deal with numerous features in the field of TC.…”
Section: Related Workmentioning
confidence: 99%
“…By performing a comprehensive exploratory study on 22 actual datasets drawn from diverse fields, it is discovered that the proposed approach improves accuracy in contrast to three more RF variants. A new FS approach known as CDDFS, which is based on divergence of category distribution, degree of membership, and degree of non-membership, is recommended in (Lu et al, 2015b). By removing features with a greater non-membership and minimal membership degree, this method aids in maintaining the features with good distinguishability and high representativeness.…”
Section: Related Workmentioning
confidence: 99%
“…Embedded approach interacts with learning algorithms at a lower computational cost than wrapper approach. [6,10,13] There are many different criteria applied to filter-based feature selection and dimensionality reduction, such as distance or similarity/dissimilarity criteria, [12,13] information theory measures, and statistics measures. Distance or similarity/dissimilarity criteria have been applied for feature selection in many application areas, such as pattern recognition, information retrieval and detection of phishing emails and websites.…”
Section: Introductionmentioning
confidence: 99%