Daniel Schlör scite author profile

Flow-based data sets are necessary for evaluating network-based intrusion detection systems (NIDS). In this work, we propose a novel methodology for generating realistic flow-based network traffic. Our approach is based on Generative Adversarial Networks (GANs) which achieve good results for image generation.A major challenge lies in the fact that GANs can only process continuous attributes. However, flow-based data inevitably contain categorical attributes such as IP addresses or port numbers. Therefore, we propose three different preprocessing approaches for flow-based data in order to transform them into continuous values. Further, we present a new method for evaluating the generated flow-based network traffic which uses domain knowledge to define quality tests. We use the three approaches for generating flow-based network traffic based on the CIDDS-001 data set. Experiments indicate that two of the three approaches are able to generate high quality data.However, labeled data sets are necessary for training supervised data mining methods (e.g. classification algorithms) and provide the basis for evaluating the performance of supervised as well as unsupervised data mining algorithms.Objective. Large training data sets with high variance can increase the robustness of anomaly-based intrusion detection methods. Therefore, we intend to build a generative model which allows us to generate realistic flow-based network traffic. The generated data can be used to improve the training of anomaly-based intrusion detection methods as well as for their evaluation. To that end, we propose an approach that is able to learn the characteristics of collected network traffic and generates new flow-based network traffic with the same underlying characteristics. Approach and Contributions. Generative Adversarial Networks (GANs) [4]are a popular method to generate synthetic data by learning from a given set of input data. GANs consist of two networks, a generator network G and a 2 discriminator network D. The generator network G is trained to generate synthetic data from noise. The discriminator network D is trained to distinguish generated synthetic data from real world data. The generator network G is trained by the output signal gradient of the discriminator network D. G and D are trained iteratively until the generator network G is able to fool the discriminator network D. GANs achieve remarkably good results in image generation [5,6,7,8]. Furthermore, GANs have also been used for generating text [9] or molecules [10]. This work uses GANs to generate complete flow-based network traffic with all typical attributes. To the best of our knowledge, this is the first work that uses GANs for this purpose. GANs can only process continuous input attributes. This poses a major challenge since flow-based network data consist of continuous and categorical attributes. Consequently, we analyze different preprocessing strategies to transform categorical attributes of flow-based network data into continuous attributes. The first method simply ...

show abstract

Extracting Semantics from Unconstrained Navigation on Wikipedia

Niebler

Schlör

Becker

et al. 2015

Künstl Intell

View full text Add to dashboard Cite

Towards Explainable Occupational Fraud Detection

Tritscher

Schlör

Gwinner

et al. 2023

View full text Add to dashboard Cite

Malware detection on windows audit logs using LSTMs

Ring

Schlör

Wunderlich

et al. 2021

Computers & Security

View full text Add to dashboard Cite

Financial Fraud Detection with Improved Neural Arithmetic Logic Units

Schlör

Ring

Krause

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel Schlör

Flow-based network traffic generation using Generative Adversarial Networks

Extracting Semantics from Unconstrained Navigation on Wikipedia

Towards Explainable Occupational Fraud Detection

Malware detection on windows audit logs using LSTMs

Financial Fraud Detection with Improved Neural Arithmetic Logic Units

Contact Info

Product

Resources

About