Anomaly detection in specific datasets involves the detection of circumstances that are common in a homogeneous data. When looking at network traffic data, it is generally difficult to determine the type of attack without proper analysis and this holds true when simply viewing a record of network traffic with thousands of internet users to detect malicious activity. However, there are different types of datasets in light of the way they record or acquire data and facts. The paper aims to compare and analyse multiple datasets including NSL-KDD and MAWI by using K-means clustering algorithm. Specifically, the paper analyses the blind-Spots of the datasets and evaluates the most suitable dataset for K-means clustering algorithm. This paper’s quantitative data analysis results are helpful in evaluating weaknesses of each dataset of traffic data, when using K-means clustering algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.