Dominik Soukup scite author profile

Machine learning is recognised as a relevant approach to detect attacks and other anomalies in network traffic. However, there are still no suitable network datasets that would enable effective detection. On the other hand, the preparation of a network dataset is not easy due to privacy reasons but also due to the lack of tools for assessing their quality. In a previous paper, we proposed a new method for data quality assessment based on permutation testing. This paper presents a parallel study on the limits of detection of such an approach. We focus on the problem of network flow classification and use well-known machine learning techniques. The experiments were performed using publicly available network datasets.

show abstract

Dataset Quality Assessment with Permutation Testing Showcased on Network Traffic Datasets

Wasielewska¹,

Soukup²,

Čejka³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Intelligent and autonomous networks require precise and fast mechanisms that ensure error-free and efficient operation. Modern solutions are increasingly based on artificial intelligence, in particular on machine learning, to reliably process huge amounts of data. Therefore, high-quality datasets are essential to train machine learning models. Unfortunately, the problem of assessing the quality of datasets is very challenging and often overlooked. This paper proposes a method for assessing the dataset quality in the context of binary classification. It is based on permutation testing and examines the strength of the relationship between observations and labels. Experiments carried out on simulated and real network datasets show that the method is sensitive to detect errors/mislabels in the labeled dataset. We also present theoretical considerations justifying our results.</p>

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dominik Soukup

Towards Evaluating Quality of Datasets for Network Traffic Domain

Security Framework for IoT and Fog Computing Networks

Behavior Anomaly Detection in IoT Networks

Evaluation of the Limit of Detection in Network Dataset Quality Assessment with PerQoDA

Dataset Quality Assessment with Permutation Testing Showcased on Network Traffic Datasets

Contact Info

Product

Resources

About