Highlights:Graphical/Tabular Abstract Fully online data stream clustering Evolutionary based clustering Adaptive radius Time based summarization Memory for past status of clusters Figure A. Comparision of clustering quality and run-time complexity of algorithms on KDD datasetPurpose: The aim of this article to propose a new data stream clustering algorithm, which has an adaptive radius, can adapt itself to the evolutionary structure of streaming data and works in a fully online manner.
Theory and Methods:In this study, kd-tree is used to forming and splitting clusters, adaptive radius approach is used to support increasing and decreasing the size of clusters, active/inactive status of clusters is used to adapt to the evolutionary structure of streaming data and all the operations are done online. In order to create a new cluster, the data that does not belong to any cluster are placed in a kd-tree, and the rangesearch operation is performed on those data according to predefined variables r (the radius of candidate cluster) and N (the number of data must be in the area). After forming the clusters, the radius of each cluster could be increased or decreased over time if necessary. Some clusters may be split and some may be merged over time because of dynamically changing structure of streaming data. Inactivation and reactivation of the status of clusters is used to allow for the identification of clusters formed in the same region at a different time interval with same cluster labels in accordance with the nature of the streaming data contrary to literature. This feature increases clustering quality of the proposed method. A summarization method that consist of time window and sliding window is used to support time based summarization without reduce performance.
Results:To verify the effectiveness of KD-AR Stream algorithm, it is compared with SE-Stream, DPStream, and CEDAS on a variety of well-known datasets in terms of clustering quality and run-time complexity. The results show that KD-AR Stream outperforms other algorithms with a higher clustering success in a reasonable time as shown in Fig. A.
Conclusion:The aim of this study is to propose a novel data stream clustering algorithm that adapts to the dynamic structure of the streaming data. The aim achieved by using the five evolutionary process which are appearance, activation/inactivation, self-evolution, merge, and split. According to the results, the proposed method is very successful in terms of clustering quality and run-time complexity.