2020
DOI: 10.3390/s20051261
|View full text |Cite
|
Sign up to set email alerts
|

Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incrementa Approach

Abstract: To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 29 publications
(23 citation statements)
references
References 41 publications
0
23
0
Order By: Relevance
“…Additionally, the accuracy and reliability of the microcirculatory functional values are two important determinants of the common microcirculatory framework. Hence, it is necessary to pre-process outliers for the input data set 30 . Comparing various outlier processing algorithms (missing values, mean correction, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, the accuracy and reliability of the microcirculatory functional values are two important determinants of the common microcirculatory framework. Hence, it is necessary to pre-process outliers for the input data set 30 . Comparing various outlier processing algorithms (missing values, mean correction, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…The output is the update of each CMC and WMC. The pseudocode starts with the energy of all CMCs with the value of 1 Decay . Afterwards, for each T in CMCs, the energy is checked whenever the energy is lower than 0, and T will be removed from all CMCs and added to WMCs.…”
Section: Boceds Moving Weak Micro-clusters To a Buffermentioning
confidence: 99%
“…Data streams are a continuous, infinite series of data records followed and arranged by embedded or precise timestamps [1]. With sensors becoming prevalent in humans' daily lives, it's clear that the availability of data streams is exponentially increasing [2].…”
Section: Introductionmentioning
confidence: 99%
“…Widely accepted and popular solutions, such as Hoeffding Trees [14] or Online Random Forests [15], achieve good accuracy and robustness in data streams [16] but are not designed to operate on unlabeled data. Over the past couple of years, methods have been proposed that satisfy the unsupervised and online requirement, such as [17][18][19], but just a few, Isolation Forest (iForest) [20], HS-Trees [21], RS-Hash [22] and Loda [23], have been shown to outperform numerous competitors and are therefore regarded as state of the art [24,25]. Even if iForest was originally intended as an offline algorithm, a handful of variants, such as [16,[26][27][28][29], have been proposed that are adapting it or are taking advantage of its concept to operate on SD.…”
Section: Related Work 21 Aspects On Unsupervised Online Outlier Detectionmentioning
confidence: 99%