The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested examples and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures available? This review attempts to answer this question through evaluating the performance (measured by accuracy, precision, and recall) of the KNN using a large number of distance measures, tested on a number of real-world data sets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, and the results showed large gaps between the performances of different distances. We found that a recently proposed nonconvex distance performed the best when applied on most data sets comparing with the other tested distances. In addition, the performance of the KNN with this top performing distance degraded only *20% while the noise level reaches 90%, this is true for most of the distances used as well. This means that the KNN classifier using any of the top 10 distances tolerates noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing with other distances.
Genetic algorithm (GA) is an artificial intelligence search method that uses the process of evolution and natural selection theory and is under the umbrella of evolutionary computing algorithm. It is an efficient tool for solving optimization problems. Integration among (GA) parameters is vital for successful (GA) search. Such parameters include mutation and crossover rates in addition to population that are important issues in (GA). However, each operator of GA has a special and different influence. The impact of these factors is influenced by their probabilities; it is difficult to predefine specific ratios for each parameter, particularly, mutation and crossover operators. This paper reviews various methods for choosing mutation and crossover ratios in GAs. Next, we define new deterministic control approaches for crossover and mutation rates, namely Dynamic Decreasing of high mutation ratio/dynamic increasing of low crossover ratio (DHM/ILC), and Dynamic Increasing of Low Mutation/Dynamic Decreasing of High Crossover (ILM/DHC). The dynamic nature of the proposed methods allows the ratios of both crossover and mutation operators to be changed linearly during the search progress, where (DHM/ILC) starts with 100% ratio for mutations, and 0% for crossovers. Both mutation and crossover ratios start to decrease and increase, respectively. By the end of the search process, the ratios will be 0% for mutations and 100% for crossovers. (ILM/DHC) worked the same but the other way around. The proposed approach was compared with two parameters tuning methods (predefined), namely fifty-fifty crossover/mutation ratios, and the most common approach that uses static ratios such as (0.03) mutation rates and (0.9) crossover rates. The experiments were conducted on ten Traveling Salesman Problems (TSP). The experiments showed the effectiveness of the proposed (DHM/ILC) when dealing with small population size, while the proposed (ILM/DHC) was found to be more effective when using large population size. In fact, both proposed dynamic methods outperformed the predefined methods compared in most cases tested.
Abstract-Users and organizations find it continuously challenging to deal with distributed denial of service (DDoS) attacks. . The security engineer works to keep a service available at all times by dealing with intruder attacks. The intrusiondetection system (IDS) is one of the solutions to detecting and classifying any anomalous behavior. The IDS system should always be updated with the latest intruder attack deterrents to preserve the confidentiality, integrity and availability of the service. In this paper, a new dataset is collected because there were no common data sets that contain modern DDoS attacks in different network layers, such as (SIDDoS, HTTP Flood). This work incorporates three well-known classification techniques: Multilayer Perceptron (MLP), Naïve Bayes and Random Forest. The experimental results show that MLP achieved the highest accuracy rate (98.63%).
The advances in information technology of both hardware and software have allowed big data to emerge recently, classification of such data is extremely slow, particularly when using K-nearest neighbors (KNN) classifier. In this article, we propose a new approach that creates a binary search tree (BST) to be used later by the KNN to speed up the big data classification. This approach is based on finding the furthestpair of points (diameter) in a data set, and then, it uses this pair of points to sort the examples of the training data set into a BST. At each node of the BST, the furthest-pair is found and the examples located at that particular node are further sorted based on their distances to these local furthest points. The created BST is then searched for a test example to the leaf; the examples found in that particular leaf are used to classify the test example using the KNN classifier. The experimental results on some well-known machine learning data sets show the efficiency of the proposed method, in terms of speed and accuracy compared with the state-ofthe-art methods reviewed. With some optimization, the proposed method has a great potential to be used for big data classification and can be generalized for other applications, particularly when classification speed is the main concern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.