2021
DOI: 10.48550/arxiv.2110.05977
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Datasets are not Enough: Challenges in Labeling Network Traffic

Abstract: In contrast to previous surveys, the present work is not focused on reviewing the datasets used in the network security field. The fact is that many of the available public labeled datasets represent the network behavior just for a particular time period. Given the rate of change in malicious behavior and the serious challenge to label, and maintain these datasets, they become quickly obsolete. Therefore, this work is focused on the analysis of current labeling methodologies applied to network-based data. In t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 57 publications
0
7
0
Order By: Relevance
“…The characteristics of obtained data are one of the challenges to tackle to achieve successful deployment of ML-based NIDS. Researchers often use benchmark datasets which contain features unobtainable in real-time [51] [25]. Flow-based features provide a useful overview of the activity of a network [1] [24].…”
Section: Methods 31 Flow-based Datamentioning
confidence: 99%
“…The characteristics of obtained data are one of the challenges to tackle to achieve successful deployment of ML-based NIDS. Researchers often use benchmark datasets which contain features unobtainable in real-time [51] [25]. Flow-based features provide a useful overview of the activity of a network [1] [24].…”
Section: Methods 31 Flow-based Datamentioning
confidence: 99%
“…An essential aspect of network traffic classification is identifying applications used within the network. However, this task can be challenging due to the limited availability of datasets [ 1 – 3 ]. To advance this field, it is crucial to provide comprehensive and up-to-date datasets.…”
Section: Objectivementioning
confidence: 99%
“…Realistic and labelled datasets are a necessity when developing data-driven capabilities for both threat hunting and intrusion detection [1], [34], [42]. Datasets used to build such hunting or detection capabilities comes with a large set of requirements from different sources: R1) datasets must contain modern attack data that is representative of current trends [20], [28]; R2) datasets need to be representative and accurate [20]; R3) datasets must provide all the relevant behavioural patterns for malicious and normal activities, and network traces [8], [29];…”
Section: Introductionmentioning
confidence: 99%
“…Both the source code of LADEMU and generated dataset can be found here: https://github.com/FFI-no/Paper-LADEMU R4) datasets must capture the stages and strategies involved in the attacks to defend against Advanced Persistent Threats (APTs) [1]; R5) datasets must contain ground truth 1 of the datapoints; to develop capabilites to detect APTs, or perform kill-chain detection, the labels must be fine-grained and indicate the different stages of an attack/campaign [8], [20], [20]. Satisfying these requirements is far from easy.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation