2022
DOI: 10.3390/bdcc6020038
|View full text |Cite
|
Sign up to set email alerts
|

Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

Abstract: This paper looks at the impact of changing Spark’s configuration parameters on machine learning algorithms using a large dataset—the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as wel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Fortynine network traffic features were extracted, and the Argus and Bro-IDS programs were employed. There are 2,540,044 streams in the dataset, including 2,218,761 benign and 321,283 aggressive streams [17,28]. We split it into 20% for tests and 80% for training, with a stratification option to keep these percentages static for all classes.…”
Section: Unsw-nb15mentioning
confidence: 99%
“…Fortynine network traffic features were extracted, and the Argus and Bro-IDS programs were employed. There are 2,540,044 streams in the dataset, including 2,218,761 benign and 321,283 aggressive streams [17,28]. We split it into 20% for tests and 80% for training, with a stratification option to keep these percentages static for all classes.…”
Section: Unsw-nb15mentioning
confidence: 99%
“…The lower right part shows the number of connection records correctly classified as normal class. The accuracy rate of each algorithm can be calculated from formula (6), the regression rate of each algorithm can be calculated from formula (7), and the F1 score of each algorithm can be calculated from formula (8).…”
Section: Analysis Of Binary Classification Experimental Resultsmentioning
confidence: 99%
“…Moreover, this paper visualizes and analyzes the network traffic data after detection and classification, which visually verifies the effectiveness of the algorithm more intuitively. The dataset used in the experiments of this paper is UNSW-NB15 [6]. Compared with KD-DCUP99 [7] and NSL-KDD [8], the dataset UNSW-NB15 has more attack types, can better simulate the network traffic on the internet nowadays and has a more practical reference value.…”
Section: Introductionmentioning
confidence: 99%
“…The analysis provides information, investigates challenges, fundamental analyses of data in terms of security, and forecasts future opportunities for machine learning in networking. From all the proposed classifiers, the random forest outperformed with 86.99% accuracy while Ada boost performed the least with 83.67% test accuracy [ 19 ].…”
Section: Literature Reviewmentioning
confidence: 99%