Development of a network intrusion detection system using Apache Hadoop and Spark

Kato, K.; Klyuev, Vitaly

doi:10.1109/desec.2017.8073860

Cited by 27 publications

(11 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Pipeline will create the workflow of the algorithm and the implementation done based on the ordering of passed variables that it means first the StringIndexer, second Vec-torAssembler, third RandomForest, fourth In-dexToString and then the model training on the training dataset after that make a prediction on the testing dataset. Kato and Klyuev [24] proposed the anomaly detection system with Apache Spark and Hadoop and by use of Hive table and unsupervised learning algorithm like K-means and also GMM algorithm, this system capable of managing and detecting an enormous dataset about 90 GB quickly with low rate of false alarm and high value about 86% of accuracy. Gupta and Kulariya [25] proposed a framework for intrusion detection system based on Apache Spark, they used feature selections as correlation based and chi-squared with different algorithms such as Random Forest, Logistic Regression and other algorithms and evaluated the performance of each algorithm on NSL-KDD and KDD'99.…”

Section: Analysis Of Empirical Resultsmentioning

confidence: 99%

Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Hajialian¹,

Toma²

2018

View full text Add to dashboard Cite

Nowadays the network security is a crucial issue and traditional intrusion detection systems are not a sufficient way. Hence the intelligent detection systems should have a major role in network security by taking into consideration to process the network big data and predict the anomalies behavior as fast as possible. In this paper, we implemented a well-known supervised algorithm Random Forest Classifier with Apache Spark on NSL-KDD dataset provided by the University of New Brunswick with the accuracy of 78.69% and 35.2% false negative ratio. Empirical results show this approach is well in order to use for intrusion detection system as well as we seeking the best number of trees to be used on Random Forest Classifier for getting higher accuracy and lower cost for the intrusion detection system.

show abstract

Section: Analysis Of Empirical Resultsmentioning

confidence: 99%

Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Hajialian¹,

Toma²

2018

View full text Add to dashboard Cite

show abstract

“…The analysis of some existing data sets (UNB-ISCX-2012 [30], CTU-13 [31], MACCDC [32] or UGR'16 [33]) allows us to observe that they have different formats and feature, so that we can say that cybersecurity data sets are highly heterogeneous.…”

Section: Categorization Of a Cybersecurity Data Setmentioning

confidence: 99%

Evaluation of Cybersecurity Data Set Characteristics for Their Applicability to Neural Networks Algorithms Detecting Cybersecurity Anomalies

et al. 2020

View full text Add to dashboard Cite

Artificial intelligence algorithms have a leading role in the field of cybersecurity and attack detection, being able to present better results in some scenarios than classic intrusion detection systems such as Snort or Suricata. In this sense, this research focuses on the evaluation of characteristics for different well-established Machine Leaning algorithms commonly applied to IDS scenarios. To do this, a categorization for cybersecurity data sets that groups its records into several groups is first considered. Making use of this division, this work seeks to determine which neural network model (multilayer or recurrent), activation function, and learning algorithm yield higher accuracy values, depending on the group of data. Finally, the results are used to determine which group of data from a cybersecurity data set are more relevant and representative for the intrusion detection, and the most suitable configuration of Machine Learning algorithm to decrease the computational load of the system.

show abstract

“…Nowadays, there exist different cybersecurity datasets that can be used for IDS based ML experimentation, i.e., UNB-ISCX-1012 [26], CTU-13 [27], MACCDC [28], UGR-16 [29], CICDS [30], KDD-99, NSL-KDD [31], or UNSW-NB15 [32]. Some of them have been widely used, like for instance the dataset KDD-99, which has been stablished as the main benchmark dataset for the different studies cases in the application of ML-based IDS.…”

Section: Cybersecurity Datasetsmentioning

confidence: 99%

An Approach for the Application of a Dynamic Multi-Class Classifier for Network Intrusion Detection Systems

et al. 2020

View full text Add to dashboard Cite

Currently, the use of machine learning models for developing intrusion detection systems is a technology trend which improvement has been proven. These intelligent systems are trained with labeled datasets, including different types of attacks and the normal behavior of the network. Most of the studies use a unique machine learning model, identifying anomalies related to possible attacks. In other cases, machine learning algorithms are used to identify certain type of attacks. However, recent studies show that certain models are more accurate identifying certain classes of attacks than others. Thus, this study tries to identify which model fits better with each kind of attack in order to define a set of reasoner modules. In addition, this research work proposes to organize these modules to feed a selection system, that is, a dynamic classifier. Finally, the study shows that when using the proposed dynamic classifier model, the detection range increases, improving the detection by each individual model in terms of accuracy.

show abstract

Development of a network intrusion detection system using Apache Hadoop and Spark

Cited by 27 publications

References 20 publications

Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Evaluation of Cybersecurity Data Set Characteristics for Their Applicability to Neural Networks Algorithms Detecting Cybersecurity Anomalies

An Approach for the Application of a Dynamic Multi-Class Classifier for Network Intrusion Detection Systems

Contact Info

Product

Resources

About