The k-Nearest Neighbor classifier is a non-complex and widely applied data classification algorithm which does well in real-world applications. The overall classification accuracy of the k-Nearest Neighbor algorithm largely depends on the choice of the number of nearest neighbors(k). The use of a constant k value does not always yield the best solutions especially for real-world datasets with an irregular class and density distribution of data points as it totally ignores the class and density distribution of a test point’s k-environment or neighborhood. A resolution to this problem is to dynamically choose k for each test instance to be classified. However, given a large dataset, it becomes very tasking to maximize the k-Nearest Neighbor performance by tuning k. This work proposes the use of Simulated Annealing, a metaheuristic search algorithm, to select optimal k, thus eliminating the prospect of an exhaustive search for optimal k. The results obtained in four different classification tasks demonstrate a significant improvement in the computational efficiency against the k-Nearest Neighbor methods that perform exhaustive search for k, as accurate nearest neighbors are returned faster for k-Nearest Neighbor classification, thus reducing the computation time.
Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-life applications are generally imbalanced. Several methods have been proposed to solve the problem of class imbalance. In this paper, we propose a hybrid method combining the preprocessing techniques and those of ensemble learning. The original training set is undersampled by evaluating the samples by stochastic measurement (SM) and then training these samples selected by Multilayer Perceptron to return a balanced training set. The MLPUS (Multilayer perceptron undersampling) balanced training set is aggregated using the bagging ensemble method. We applied our method to the real-life Niger_Rice dataset and forty-four other imbalanced datasets from the KEEL repository in this study. We also compared our method with six other existing methods in the literature, such as the MLP classifier on the original imbalance dataset, MLPUS, UnderBagging (combining random under-sampling and bagging), RUSBoost, SMOTEBagging (Synthetic Minority Oversampling Technique and bagging), SMOTEBoost. The results show that our method is competitive compared to other methods. The Niger_Rice real-life dataset results are 75.6, 0.73, 0.76, and 0.86, respectively, for accuracy, F-measure, G-mean, and ROC with our proposed method. In contrast, the MLP classifier on the original imbalance Niger_Rice dataset gives results 72.44, 0.82, 0.59, and 0.76 respectively for accuracy, F-measure, G-mean, and ROC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.