Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

Vu, Ly; Nguyen, Quang Uy

doi:10.32913/mic-ict-research.v2020.n1.894

Cited by 8 publications

(14 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e experimental results show that GAN's balanced attack sample dataset produces more accurate results than the unbalanced attack sample set. Vu and Nguyen proposed a method based on Auxiliary Classifier Generative Adversarial Network (ACGAN) to enhance the balance of the dataset [24]. e method achieved better performance than machine learning algorithms trained on the original dataset and other sampling techniques.…”

Section: Generative Adversarial Networkmentioning

confidence: 99%

See 1 more Smart Citation

A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection

Liu

Zhang

et al. 2021

Security and Communication Networks

View full text Add to dashboard Cite

In recent years, there have been numerous cyber security issues that have caused considerable damage to the society. The development of efficient and reliable Intrusion Detection Systems (IDSs) is an effective countermeasure against the growing cyber threats. In modern high-bandwidth, large-scale network environments, traditional IDSs suffer from a high rate of missed and false alarms. Researchers have introduced machine learning techniques into intrusion detection with good results. However, due to the scarcity of attack data, such methods’ training sets are usually unbalanced, affecting the analysis performance. In this paper, we survey and analyze the design principles and shortcomings of existing oversampling methods. Based on the findings, we take the perspective of imbalance and high dimensionality of datasets in the field of intrusion detection and propose an oversampling technique based on Generative Adversarial Networks (GAN) and feature selection. Specifically, we model the complex high-dimensional distribution of attacks based on Gradient Penalty Wasserstein GAN (WGAN-GP) to generate additional attack samples. We then select a subset of features representing the entire dataset based on analysis of variance, ultimately generating a rebalanced low-dimensional dataset for machine learning training. To evaluate the effectiveness of our proposal, we conducted experiments based on the NSL-KDD, UNSW-NB15, and CICIDS-2017 datasets. The experimental results show that our method can effectively improve the detection performance of machine learning models and outperform the baselines.

show abstract

Section: Generative Adversarial Networkmentioning

confidence: 99%

“…We will compare these four methods. In addition, we also compare them with the GAN-based methods of Vu and Nguyen [24] and Lee and Park [22]. e F-measure is an overall evaluation of the Precision and Recall, which we use to measure the methods' performance.…”

Section: Experiments IIImentioning

confidence: 99%

A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection

Liu

Zhang

et al. 2021

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…Low data regimes are found in many real-life applications in which researchers face data scarcity problems [1]. The data scarcity pertains to the situation where one class is abundant in data samples (especially normal behaviour) while the anomaly samples are rare and difficult to gather [2]. The data scarcity can also be described as a data imbalance problem potentially resulting in decision bias in the machine learning (ML) classifiers.…”

Section: Introductionmentioning

confidence: 99%

“…Using generative adversarial networks (GANs) as synthetic oversamplers has been a voguish research endeavour for low data regimes [3], [7]. Various researchers have demonstrated that GANs are more effective as compared to other synthetic oversamplers like SMOTE [2], [6], [8], [9]. It is found in many studies that due to the adversarial factor, GANs can better estimate the target probability distribution [2], [8], [10].…”

Section: Introductionmentioning

confidence: 99%

“…Various researchers have demonstrated that GANs are more effective as compared to other synthetic oversamplers like SMOTE [2], [6], [8], [9]. It is found in many studies that due to the adversarial factor, GANs can better estimate the target probability distribution [2], [8], [10]. In a simple/vanilla GAN, two different neural networks generator (G) and discriminator (D) work antagonistically to learn from each other's experience to converge to Nash equilibrium [11].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

EVAGAN: Evasion Generative Adversarial Network for Low Data Regimes

Randhawa¹,

Aslam²,

Alauthman³

et al. 2021

Preprint

View full text Add to dashboard Cite

Many recent literary works have leveraged generative adversarial networks (GANs) to spawn unseen evasion samples. The purpose is to annex the generated data with the original train set for adversarial training to improve the detection performance of machine learning (ML) classifiers. The quality of generating adversarial samples relies on the adequacy of training data samples. However, in low data regimes like medical anomaly detection, drug discovery and cybersecurity, the attack samples are scarce in number. This paper proposes a novel GAN design called Evasion Generative Adversarial Network (EVAGAN) that is more suitable for low data regime problems that use oversampling for detection improvement of ML classifiers. EVAGAN not only can generate evasion samples, but its discriminator can act as an evasion aware classifier. We have considered Auxiliary Classifier GAN (ACGAN) as a benchmark to evaluate the performance of EVAGAN on cybersecurity (ISCX-2014, CIC-2017 and CIC2018) botnet and CV (MNIST) datasets. We demonstrate that EVAGAN outperforms ACGAN for unbalanced datasets with respect to detection performance, training stability, time complexity. EVAGAN's generator quickly learns to generate the low sample class and hardens its discriminator simultaneously. In contrast to ML classifiers that require security hardening after being adversarially trained by GAN generated data, EVAGAN renders it needless. The experimental analysis proves EVAGAN to be an efficient evasion hardened model for low data regimes in cybersecurity and CV. Code will be available at https://github.com/rhr407/EVAGAN.Impact Statement-The applications of Artificial Intelligence (AI) can help improve the quality of human life. The use of AI is not only limited to medical anomaly detection and drug discovery but can be leveraged in computer networks to keep people safe from malicious activities on the Internet. However, the AI-based models can be biased towards the majority class of data on which they are trained due to data imbalance. Anomaly data samples are always scarce as compared to the normal data samples. So this is an open research problem to solve. Our work is an effort to improve the AI-based methods in detection performance, time complexity and stability. Using the proposed technique, we can train our AI model using fewer anomaly samples efficiently and reduce the time complexity compared to the state-of-the-art in anomaly detection.

show abstract

Intrusion Traffic Detection and Classification Based on Unsupervised Learning

Zhong,

Xie,

Tang

2024

IEEE Access

View full text Add to dashboard Cite

To solve the problem that the existing intrusion traffic detection models generally adopt machine learning algorithm and supervised deep learning algorithm, and the classification accuracy of model small samples is low, A unsupervised learning intrusion traffic classification model based on Wasserstein divergence objective for generative adversarial nets (WGAN-div) and information maximizing generative adversarial nets (Info GAN) is presented. The algorithm uses generative adversarial network to optimize the sampling of unbalanced data sets and effectively improves the feature extraction capability of small samples of the model. Firstly, the unbalanced data training set is oversampled by WGAN-div to improve the data distribution. Then, the non-data part is processed by independent thermal coding and integrated with the data part to reduce the complexity of pretreatment. Finally, the Info GAN model is used for data training. Performance evaluation and algorithm performance comparison were carried out in NSL-KDD, CICIDS2017 and UNSW-NB15 data sets. The experimental results show that the accuracy of multi-classification task is 91.1%, 97.1%, 79.9% respectively, and the accuracy of binary classification task is 90.9%, 96.9%, 86.1% respectively. Compared with the classical deep learning algorithm, the Info GAN model has higher accuracy and lower false positive rate, and has higher reliability and engineering application value.

show abstract

Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

Cited by 8 publications

References 33 publications

A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection

A GAN and Feature Selection-Based Oversampling Technique for Intrusion Detection

EVAGAN: Evasion Generative Adversarial Network for Low Data Regimes

Intrusion Traffic Detection and Classification Based on Unsupervised Learning

Contact Info

Product

Resources

About