2017
DOI: 10.4236/jis.2017.84022
|View full text |Cite
|
Sign up to set email alerts
|

Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey

Abstract: This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Tre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(11 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…We consider OD on SD, as is the context in [13]. Widely accepted and popular solutions, such as Hoeffding Trees [14] or Online Random Forests [15], achieve good accuracy and robustness in data streams [16] but are not designed to operate on unlabeled data. Over the past couple of years, methods have been proposed that satisfy the unsupervised and online requirement, such as [17][18][19], but just a few, Isolation Forest (iForest) [20], HS-Trees [21], RS-Hash [22] and Loda [23], have been shown to outperform numerous competitors and are therefore regarded as state of the art [24,25].…”
Section: Related Work 21 Aspects On Unsupervised Online Outlier Detectionmentioning
confidence: 99%
“…We consider OD on SD, as is the context in [13]. Widely accepted and popular solutions, such as Hoeffding Trees [14] or Online Random Forests [15], achieve good accuracy and robustness in data streams [16] but are not designed to operate on unlabeled data. Over the past couple of years, methods have been proposed that satisfy the unsupervised and online requirement, such as [17][18][19], but just a few, Isolation Forest (iForest) [20], HS-Trees [21], RS-Hash [22] and Loda [23], have been shown to outperform numerous competitors and are therefore regarded as state of the art [24,25].…”
Section: Related Work 21 Aspects On Unsupervised Online Outlier Detectionmentioning
confidence: 99%
“…It can save memory by deactivating less promising leaves when memory reaches a limit then it turns back to normal when memory is free [10]. Also, it monitors the available memory and prunes leaves (where sufficient statistics are stored) depending on recent accuracy [11], [12].…”
Section: Decisionmentioning
confidence: 99%
“…Research on Hoeffding has also been conducted by several researchers. The Hoeffding Tree algorithm has been implemented using a streaming dataset [13]. Hoeffding is also used to classify diabetes mellitus [14].…”
Section: B Related Workmentioning
confidence: 99%