Abstract⎯ Feature selection has been used widely for a variety of data, yielding higher speeds and reduced computational cost for the classification process. However, it is in microarray datasets where its advantages become more evident and are more required. In this paper we present a novel approach to accomplish this based on the concept of discernibility that we introduce to depict how separated the classes of a dataset are. We develop and test two independent feature selection methods that follow this approach. The results of our experiments on four microarray datasets show that discernibility-based feature selection reduces the dimensionality of the datasets involved without compromising the performance of the classifiers.
Reducing the dimensionality of a dataset is an important and often challenging task. This can be done by either reducing the number of features, a task called feature selection, or by reducing the number of patterns, called data reduction. In this paper we propose methods that employ a novel concept called Discernibility for achieving these two tasks separately, with the aim to solve classification problems. The experimental results verify our claim that the proposed methods are a viable alternative for dimensionality reduction, for various datasets and a variety of classifiers.
A novel method for evaluating the reliability of a classifier on a pattern is proposed based on the discernibility of a pattern's class against other classes from the pattern. Three measures of discernibility are proposed and experimentally compared with each other and with more conventional techniques based on the classification scores for class labels. The classification accuracy can be significantly enhanced through discernibility measures using the most reliable -'elite' -patterns. It can be further boosted by forming an amalgamation of the elites of different classifiers. Improved performance is achieved at the price of rejecting many patterns. There are situations in which this price is worth paying -when the non-reliable predictions, however good, lead to the need for the manual testing of very cumbersome and complex technical devices or in diagnostics of human terminal diseases. Contrary to conventional techniques for estimating reliability, the proposed measures are applicable to small datasets as well as to datasets with complex class structures on which conventional classifiers show low accuracy rates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.