Intrusion detection system (IDS) has been intensively studied in the research community. The cyber threats that are evolving rapidly have caused a major challenge for IDS to achieve a reliable detection rate. Despite the application of various machine learning approaches to improve the efficiency of IDSs, present intrusion detection approaches still struggle to reach good performance. In this paper, the Canadian Institute for Cybersecurity on Intrusion Detection Systems 2017 (CICIDS-2017) dataset was selected. To solve the multi-class imbalanced classification problem, multiple imputation by chained equations (MICE) was implemented on the dataset to deal with missing data existing in the dataset. Recursive feature elimination (RFE) method with an estimator of decision tree classifier was also implemented to reduce the number of features through computation of feature importance. The training data was resampled using synthetic minority oversampling technique with combination of the edited nearest neighbor (SMOTE-ENN) to improve the detection of minority classes. Four machine learning approaches were implemented in this research which are K-nearest neighbor, random forest, XGBoost, and LightGBM were trained and tested. The hyperparameter importance of each of the models was also analyzed using Bayesian Optimization with Tree-structured Parzen Estimator (BO-TPE) to enable more experimentation on the tuning of the hyperparameters. All four machine learning approaches achieved at least 98% for all three performance metrics which are accuracy, Matthews correlation coefficient (MCC) and area under the receiver operating characteristic curve (AUROC).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.