The k -nearest neighbor ( k -NN) classifier is one of the most widely used methods of classification due to several interesting features, including good generalization and easy implementation. Although simple, it is usually able to match and even outperform more sophisticated and complex methods. One of the problems with this approach is fixing the appropriate value of k . Although a good value might be obtained using cross validation, it is unlikely that the same value could be optimal for the whole space spanned by the training set. It is evident that different regions of the feature space would require different values of k due to the different distributions of prototypes. The situation of a query instance in the center of a class is very different from the situation of a query instance near the boundary between two classes. In this brief, we present a simple yet powerful approach to setting a local value of k . We associate a potentially different k to every prototype and obtain the best value of k by optimizing a criterion consisting of the local and global effects of the different k values in the neighborhood of the prototype. The proposed method has a fast training stage and the same complexity as the standard k -NN approach at the testing stage. The experiments show that this simple approach can significantly outperform the standard k -NN rule for both standard and class-imbalanced problems in a large set of different problems.
This paper presents a cooperative evolutionary approach for the problem of instance selection for instance based learning. The model presented takes advantage of one of the recent paradigms in the field of evolutionary computation: cooperative coevolution. This paradigm is based on a similar approach to the philosophy of divide and conquer. In our method, the training set is divided into several subsets that are searched independently. A population of global solutions relates the search in different subsets and keeps track of the best combinations obtained. The proposed model has the advantage over standard methods in that it does not rely on any specific distance metric or classifier algorithm. Additionally, the fitness function of the individuals considers both storage requirements and classification accuracy, and the user can balance both objectives depending on his/her specific needs, assigning different weights to each one of these two terms. The method also shows good scalability when applied to large datasets.The proposed model is favorably compared with some of the most successful standard algorithms, IB3, ICF and DROP3, with a genetic algorithm using CHC method, and with four recent methods of instance selection, MSS, entropy-based instance selection, IMOEA and LVQPRU. The comparison shows a clear advantage of the proposed algorithm in terms of storage requirements, and is, at least, as good as any of the other methods in terms of testing error. A large set of 50 problems from the UCI Machine Learning Repository is used for the comparison. Additionally, a study of the effect of instance label noise is carried out, showing the robustness of the proposed algorithm.The major contribution of our work is showing that cooperative coevolution can be used to tackle large problems taking advantage of its inherently modular nature. We show that a combination of cooperative coevolution together with the principle of divide-and-conquer can be very effective both in terms of improving performance and in reducing computational cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.