This paper tackles the issue of objective performance evaluation of machine learning classifiers, and the impact of the choice of test instances. Given that statistical properties or features of a dataset affect the difficulty of an instance for particular classification algorithms, we examine the diversity and quality of the UCI repository of test instances used by most machine learning researchers. We show how an instance space can be visualized, with each classification dataset represented as a point in the space. The instance space is constructed to reveal pockets of hard and easy instances, and enables the strengths and weaknesses of individual classifiers to be identified. Finally, we propose a methodology to generate new test instances with the aim of enriching the diversity of the instance space, enabling potentially greater insights than can be afforded by the current UCI repository.
a b s t r a c tSelecting the most appropriate algorithm to use when attempting to solve a black-box continuous optimization problem is a challenging task. Such problems typically lack algebraic expressions, it is not possible to calculate derivative information, and the problem may exhibit uncertainty or noise. In many cases, the input and output variables are analyzed without considering the internal details of the problem. Algorithm selection requires expert knowledge of search algorithm efficacy and skills in algorithm engineering and statistics. Even with the necessary knowledge and skills, success is not guaranteed.In this paper, we present a survey of methods for algorithm selection in the black-box continuous optimization domain. We start the review by presenting Rice's (1976) selection framework. We describe each of the four component spaces -problem, algorithm, performance and characteristic -in terms of requirements for black-box continuous optimization problems. This is followed by an examination of exploratory landscape analysis methods that can be used to effectively extract the problem characteristics. Subsequently, we propose a classification of the landscape analysis methods based on their order, neighborhood structure and computational complexity. We then discuss applications of the algorithm selection framework and the relationship between it and algorithm portfolios, hybrid meta-heuristics, and hyper-heuristics. The paper concludes with the identification of key challenges and proposes future research directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.