In this study, an Intrusion Detection System (IDS) is designed based on Machine Learning classifiers, and its performance is evaluated for the set of attacks entailed in the UNSW-NB15 dataset. UNSW-NB15 dataset contains 2,540,226 realistic network data instances and 49 features. Most research uses a representative sample of this dataset with present training and testing subsets, which includes 257,673 records in total. The dataset was submitted to visual data analysis to discover potential reasons or flaws which likely challenge Machine Learning classifiers. Pre-processing strategies are necessary before this data can be used for data-driven prototype development for IDS because of the class representation imbalance with pattern counts and feature overlap. The method used for pre-processing is implemented by min-max scaling in the normalization phase, followed by applying Elastic Net and Sequential Feature Selection (SFS) algorithms. This work employed ensemble methods using three base classifiers, namely Balanced Bagging, XGBoost, and RF-HDDT, augmented to address the imbalance issue. Parameters of Balanced Bagging and XGBoost are tuned for the imbalanced data, and the Hellinger distance metric supplements random Forest to address the limitations of the default distance metric. Two new algorithms are proposed to address the class overlap issue in the dataset and applied during training. These two algorithms are leveraged to help improve the performance on the testing dataset by affecting the final classification decision made by three base classifiers as part of the ensemble classifier, which employs a majority vote combiner. The performance evaluation of the proposed method for binary and multi-category classification was evaluated using standard metrics, including those generated from the confusion matrix, and compared to other studies using the same dataset. The proposed design outperforms those reported in the literature by a significant margin for binary and multi-category classification cases.
The intrusion detection system has been widely studied and deployed by researchers for providing better security to computer networks. The increasing volume of attacks, combined with the rapid improvement of machine learning (ML) has made the collaboration of intrusion detection techniques with machine learning and deep learnings are a popular subject and a feasible approach for cyber threat protection. Machine learning usually involves the training process using huge sample data. Since the huge input data may cause a negative effect on the training and detection performance of the machine learning model, feature selection becomes a crucial technique to rule out the irrelevant and redundant features from the dataset. This study applied a feature selection approach for intrusion detection that incorporated state-of-the-art feature selection algorithms with attack characteristic feature to produce an optimized set of features for the machine learning algorithms, which was then used to train the machine learning model. CSECIC-IDS2018 dataset, the most recent benchmark dataset with a wide attack diversity and features have been used to create the efficient feature subset. The result of the experiment was produced using machine learning models with a decision tree classifier and analyzed with respect to the accuracy, precision, recall, and f1 score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.