Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Hajialian, Hesamaldin; Toma, Cristian

doi:10.12948/issn14531305/22.4.2018.08

Cited by 8 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, RF builds multiple decision trees and chooses the random subspaces of the features for each of them. Then, the votes of trees are aggregated and the class with the most votes is the prediction result [34]. As an excellent classification model, RF can successfully reduce the overfitting and calculate the nonlinear and interactive effects of variables.…”

Section: Random Forestmentioning

confidence: 99%

A hybrid cost-sensitive ensemble for heart disease prediction

Zhang

2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What’s more, the misclassification cost could be very high. Methods A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

show abstract

Section: Random Forestmentioning

confidence: 99%

A hybrid cost-sensitive ensemble for heart disease prediction

Zhang

2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

Section: Random Forestmentioning

confidence: 99%

A hybrid cost-sensitive ensemble for heart disease prediction

Zhang²

2021

Preprint

View full text Add to dashboard Cite

Background: Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. Methods: A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results: The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions: The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

show abstract

“…Several works have explored applying ensembles to NIDS and shown that these ensemble approaches, usually random forests, can be highly effective. In [11], the authors search for the optimal number of decision trees to include in the forest, and explore the performance/efficiency tradeoffs for different sizes while run in Apache Spark. In [12], the authors compare random forests with individual algorithms such as Naive Bayes, SVM, K-NN, and a decision tree, and it is found that the random forest offers the highest precision and accuracy.…”

Section: Related Workmentioning

confidence: 99%

Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

2020

View full text Add to dashboard Cite

As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this paper investigates several homogeneous and heterogeneous ensembles, proposes three novel online heterogeneous ensembles for intrusion detection, and compares their performance accuracy, run-time complexity, and response to concept drifts. Out of the proposed novel online ensembles, the heterogeneous ensemble consisting of an adaptive random forest of Hoeffding Trees combined with a Hoeffding Adaptive Tree performed the best, by dealing with concept drift in the most effective way. While this scheme is less accurate than a larger size adaptive random forest, it offered a marginally better run-time, which is beneficial for online training.

show abstract

Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark

Cited by 8 publications

References 11 publications

A hybrid cost-sensitive ensemble for heart disease prediction

A hybrid cost-sensitive ensemble for heart disease prediction

A hybrid cost-sensitive ensemble for heart disease prediction

Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

Contact Info

Product

Resources

About