Cancer is one of the deadly diseases of human life. The patient may likely to survive if the disease is diagnosed in its early stages. In this Letter, the authors propose a genetic search fuzzy rough (GSFR) feature selection algorithm, which is hybridised using the evolutionary sequential genetic search technique and fuzzy rough set to select features. The genetic operator's selection, crossover and mutation are applied to generate the subset of features from dataset. The generated subset is subjected to the evaluation with the modified dependency function of the fuzzy rough set using positive and boundary regions, which act as a fitness function. The generation and evaluation of the subset of features continue until the best subset is arrived at to develop the classification model. Selected features are applied to the different classifiers, from the classifiers fuzzy-rough nearest neighbour (FRNN) classifier, which outperforms in terms of classification accuracy and computation time. Hence, the FRNN is applied for performance analysis of existing feature selection algorithms against the proposed GSFR feature selection algorithm. The result generated from the proposed GSFR feature selection algorithm proved to be precise when compared to other feature selection algorithms.
Background:
Data mining algorithms are extensively used to classify the data, in which
prediction of disease using minimal computation time plays a vital role.
Objective:
The aim of this paper is to develop the classification model from reduced features and
instances.
Methods:
In this paper we proposed four search algorithms for feature selection the first algorithm
is Random Global Optimal (RGO) search algorithm for searching the continuous, global optimal
subset of features from the random population. The second is Global and Local Optimal (GLO)
search algorithm for searching the global and local optimal subset of features from population. The
third one is Random Local Optimal (RLO) search algorithm for generating random, local optimal
subset of features from the random population. Finally the Random Global and Optimal (RGLO)
search algorithm for searching the continuous, global and local optimal subset of features from the
random population. RGLO search algorithm combines the properties of first three stated algorithm.
The subsets of features generated from the proposed four search algorithms are evaluated using the
consistency based subset evaluation measure. Instance based learning algorithm is applied to the
resulting feature dataset to reduce the instances that are redundant or irrelevant for classification.
The model developed using naïve Bayesian classifier from the reduced features and instances is
validated with the tenfold cross validation.
Results:
Classification accuracy based on RGLO search algorithm using naïve Bayesian classifier
is 94.82% for Breast, 97.4% for DLBCL, 98.83% for SRBCT and 98.89% for Leukemia datasets.
Conclusion:
The RGLO search based reduced features results in the high prediction rate with less
computational time when compared with the complete dataset and other proposed subset generation
algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.