Feature subset selection is of immense importance in the field of data mining. The increased dimensionality of data makes testing and training of general classification method difficult. Mining on the reduced set of attributes reduces computation time and also helps to make the patterns easier to understand. In this paper a wrapper approach for feature selection is proposed. As a part of feature selection step we used wrapper approach with Genetic algorithm as random search technique for subset generation ,wrapped with different classifiers/ induction algorithm namely decision tree C4.5, NaïveBayes, Bayes networks and Radial basis function as subset evaluating mechanism on four standard datasets namely Pima Indians Diabetes Dataset, Breast Cancer, Heart Stat log and Wisconsin Breast Cancer. Further the relevant attributes identified by proposed wrapper are validated using classifiers. Experimental results illustrate, employing feature subset selection using proposed wrapper approach has enhanced classification accuracy.
Neural Networks are one of many data mining analytical tools that can be utilized to make predictions for medical data. Model selection for a neural network entails various factors such as selection of the optimal number of hidden nodes, selection of the relevant input variables and selection of optimal connection weights. This paper presents the application of hybrid model that integrates Genetic Algorithm and Back
Exudates are one of the primary signs of diabetic retinopathy, which is a main cause of blindness and can be prevented with an early screening process. In this paper, authors have attempted to detect exudates using back propagation neural network. The publicly available diabetic retinopathy dataset DIARETDB1 has been used in the evaluation process. To prevent the optic disk from interfering with exudates detection, the optic disk is eliminated. Significant features are identified from the images after preprocessing by using two methods: Decision tree and GA-CFS method are used as input to the BPN model to detect the exudates and non-exudates at pixel level. The results prove that, BPN performance with features identified by Decision tree and GA_CFS approach has outperformed the performance of BPN with all inputs. The BPN classifier best performance was found with Sensitivity of 96.97 %, Specificity of 100% and classification accuracy of 98.45%.
M edical data mining has enormous potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized by the physicians to improve clinical diagnosis. Feature subset selection is one of data preprocessing step, which is of immense importance in the field of data mining. As a part of feature subset selection step of data preprocessing, a filter approach with genetic algorithm (GA) and Correlation based feature selection has been used in a cascaded fashion. GA rendered global search of attributes with fitness evaluation effected by CFS. Experimental results signify that the feature subset recognized by the proposed filter GA+CFS, when given as input to five classifiers, namely decision tree, Naïve Bayes, Bayesian, Radial basis function and k-nearest neighbor classifiers showed enhanced classification accuracy. Experiments have been carried out on four medical data sets publicly available at UCI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.