2016
DOI: 10.1186/s41044-016-0011-3
|View full text |Cite
|
Sign up to set email alerts
|

State-of-the-art on clustering data streams

Abstract: Clustering is a key data mining task. This is the problem of partitioning a set of observations into clusters such that the intra-cluster observations are similar and the inter-cluster observations are dissimilar. The traditional setup where a static dataset is available in its entirety for random access is not applicable as we do not have the entire dataset at the launch of the learning, the data continue to arrive at a rapid rate, we can not access the data randomly, and we can make only one or at most a sma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0
3

Year Published

2019
2019
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 69 publications
(33 citation statements)
references
References 52 publications
0
30
0
3
Order By: Relevance
“…Four most commonly used data structures are feature vectors, prototype arrays, coreset trees and grids. Feature vectors keep the summary of the data instances, prototype arrays keep only a number of representative instances that exemplify the data, coreset trees keep the summary in a tree structure and grids keep the data density in the feature space (Ghesmoune et al, 2016;Mansalis et al, 2018;Silva et al, 2013).…”
Section: Data Structures For Data Streamsmentioning
confidence: 99%
See 1 more Smart Citation
“…Four most commonly used data structures are feature vectors, prototype arrays, coreset trees and grids. Feature vectors keep the summary of the data instances, prototype arrays keep only a number of representative instances that exemplify the data, coreset trees keep the summary in a tree structure and grids keep the data density in the feature space (Ghesmoune et al, 2016;Mansalis et al, 2018;Silva et al, 2013).…”
Section: Data Structures For Data Streamsmentioning
confidence: 99%
“…Partitioning based algorithms have an easy implementation in general. StreamLSearch (O'Callaghan et al, 2002), incremental k-means (Ordonez, 2003), CluStream (Aggarwal et al, 2003), HP-Stream (Aggarwal et al, 2004), SWClustering (Zhou et al, 2008), StreamKM++ (Ackermann et al, 2012), strAP (Zhang et al, 2014) and CLARA (Kaufman and Rousseeuw, 1990) are partitioning based algorithms (Ghesmoune et al, 2016;Kumar, 2016;Mousavi et al, 2015). -Grid based algorithms use grid data structure.…”
Section: Stream Clustering Algorithmsmentioning
confidence: 99%
“…It is an incremental and dynamic clustering algorithm that follows a hierarchical clustering technique for databases by incrementally constructing a clustering feature (CF) tree, which is a subcluster of data points or better described as a tree-like representation of data points in a data set. 22 Best clustering is achieved by multi-scanning, and having more available memory which maximizes good result. 11 BIRCH is an incremental clustering algorithm that has 4 phases.…”
Section: Balanced Iterative and Clustering Using Hierarchiesmentioning
confidence: 99%
“…In Ghesmoune et al (2016) the authors discuss 19 algorithms and are among the first to highlight the research area of Neural Gas (NG) for stream clustering. However, only a single grid-based algorithm is discussed and other popular algorithms are missing.…”
Section: Related Workmentioning
confidence: 99%