Finding and Tracking Multi-Density Clusters in Online Dynamic Data Streams

Fahy, Conor; Yang, Shengxiang

doi:10.1109/tbdata.2019.2922969

Cited by 22 publications

(20 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After calculating its centre c, with Equation (20), and radius r, with Equation (21), the -neighbourhood method is again used to find density reachable microclusters. Among them, a process is undertaken to detect the so-called border microclusters [35] inside C, which obviously are not present during the first iteration as C initially contains only one microcluster. Border microclusters are defined as density reachable microclusters that have a density level that is below the density threshold of the first microclusters present in C. Having a threshold that is too high, cluster C will not expand, whilst having a value that is too low, cluster C will contain dissimilar microclusters.…”

Section: Detecting and Forming New Clustersmentioning

confidence: 99%

“…Border microclusters are defined as density reachable microclusters that have a density level that is below the density threshold of the first microclusters present in C. Having a threshold that is too high, cluster C will not expand, whilst having a value that is too low, cluster C will contain dissimilar microclusters. Based on the experimental data from the original paper [35], a 10% threshold yields good performance.…”

Section: Detecting and Forming New Clustersmentioning

confidence: 99%

“…MDSC [35] is another single phase method exploiting the SI paradigm inspired by the density based approached introduced in DenStream. In this method, the Ant Colony Optimisation (ACO) algorithm [36] is used to group similar microclusters optimally during the online phase.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

et al. 2019

View full text Add to dashboard Cite

This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses "microclusters" and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream's robustness to noise and resiliency to parameter changes.

show abstract

Section: Detecting and Forming New Clustersmentioning

confidence: 99%

Section: Detecting and Forming New Clustersmentioning

confidence: 99%

See 1 more Smart Citation

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

et al. 2019

View full text Add to dashboard Cite

show abstract

“…A good overview on density based stream clustering is provided in [3]. More recent proposals for density-clustering include Ant Colony Stream clustering (ACSC) [19], which uses a decentralised swarm intelligence approach, CEDAS [31] and SNCStream+ [8], use a graph structure with micro-clusters as nodes, and Multi-Density Stream Clustering (MDSC) [20], which combines both online and off-line phases into a single online phase and can discover clusters with varying levels of density.…”

Section: Related Workmentioning

confidence: 99%

“…In summary, the majority of research on dynamic FS for data streams assume the supervised method [15], [33], [40], [42] and is typically used for classification tasks and not suitable for clustering. Existing stream-clustering algorithms can deal with change at the concept level (concept drift and concept evolution) [14], [19], [20], [31]. However, these methods suffer from the curse of dimensionality and are not designed to track change at the feature level.…”

Section: Related Workmentioning

confidence: 99%

Dynamic Feature Selection for Clustering High Dimensional Data Streams

Fahy

Yang

2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

Change in a data stream can occur at the concept level and at the feature level. Change at the feature level can occur if new, additional features appear in the stream or if the importance and relevance of a feature changes as the stream progresses. This type of change has not received as much attention as concept-level change. Furthermore, a lot of the methods proposed for clustering streams (density-based, graph-based, and grid-based) rely on some form of distance as a similarity metric and this is problematic in high-dimensional data where the curse of dimensionality renders distance measurements and any concept of ''density'' difficult. To address these two challenges we propose combining them and framing the problem as a feature selection problem, specifically a dynamic feature selection problem. We propose a dynamic feature mask for clustering high dimensional data streams. Redundant features are masked and clustering is performed along unmasked, relevant features. If a feature's perceived importance changes, the mask is updated accordingly; previously unimportant features are unmasked and features which lose relevance become masked. The proposed method is algorithm-independent and can be used with any of the existing density-based clustering algorithms which typically do not have a mechanism for dealing with feature drift and struggle with high-dimensional data. We evaluate the proposed method on four density-based clustering algorithms across four high-dimensional streams; two text streams and two image streams. In each case, the proposed dynamic feature mask improves clustering performance and reduces the processing time required by the underlying algorithm. Furthermore, change at the feature level can be observed and tracked.INDEX TERMS Data stream clustering, dynamic feature selection, feature drift, feature evolution, unsupervised feature selection.

show abstract

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Nadeem

Sicco

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Finding and Tracking Multi-Density Clusters in Online Dynamic Data Streams

Cited by 22 publications

References 38 publications

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

Dynamic Feature Selection for Clustering High Dimensional Data Streams

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Contact Info

Product

Resources

About