Feature subset selection (FSS) has received a great deal of attention in statistics, machine learning, and data mining. Real world data analyzed by data mining algorithms can involve a large number of redundant or irrelevant features or simply too many features for a learning algorithm to handle them efficiently. Feature selection is becoming essential as databases grow in size and complexity. The selection process is expected to bring benefits in terms of better performing models, computational efficiency, and simpler more understandable models. Evolutionary computation (EC) encompasses a number of naturally inspired techniques such as genetic algorithms, genetic programming, ant colony optimization, or particle swarm optimization algorithms. Such techniques are well suited to feature selection because the representation of a feature subset is straightforward and the evaluation can also be easily accomplished through the use of wrapper or filter algorithms. Furthermore, the capability of such heuristic algorithms to efficiently search large search spaces is of great advantage to the feature selection problem. Here, we review the use of different EC paradigms for feature selection in classification problems. We discuss details of each implementation including representation, evaluation, and validation. The review enables us to uncover the best EC algorithms for FSS and to point at future research directions. © 2013 John Wiley & Sons, Ltd.
How to cite this article:WIREs Data Mining Knowl Discov 2013Discov , 3:381-407. doi: 10.1002Discov /widm.1106
INTRODUCTION
The problems generated by large datasets that need to be analyzed have been surfacing since the 1980s in the context of machine learning, 1-4 statistics, [5][6][7][8][9] and data mining.
10,11Structured data used for analysis generally consists of a number of features (i.e., variables, attributes, or columns) that capture the characteristics of real world objects, and a number of representative objects or observations being captured as rows or samples. Statistics, machine learning, and data mining all offer techniques that can infer models from such data. In traditional statistics, there are a few wellchosen features and a larger number of observations * Correspondence to: bli@cmp.uea.ac.uk School of Computing Sciences, University of East Anglia, Norwich, UK Conflict of interest: The authors have declared no conflicts of interest for this article. often collected purposefully for an analysis task. Machine learning focuses on developing learning models and often uses small and reasonably wellformed training data sets for the learning task. Data mining, however, is a discipline focused on developing or adapting analysis techniques for routinely collected real world data, i.e., large, noisy, and uncertain data. It is worth noting that many of the techniques used in data mining originated with in the statistics or machine learning community.Data available for the analysis may include many irrelevant or redundant features. A feature is considered to be relevan...