Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.Index: Data Stream Clustering, Density based Clustering. I.I INTRODUCTION Clustering is a standard or imperative system of exploratory information mining, which isolates an arrangement of information into a few gatherings (additionally called clusters) such that items in same gathering are more comparable with each other in some sense than with the items in different gatherings. Data streams are the continuous flow of data and its size has no bounds [2][10]. Many applications produce this type of streaming data like GPS data from vehicles, web click stream data, computer network monitoring, readings from sensors etc. Data stream clustering is done for better understanding of data.Cluster algorithms and their parameter settings depend on the individual data sets. Data stream clustering algorithms process the data quickly by providing timely results, detects whether new clusters should appear or disappear and also identifies the outliers.Clustering of data streams can be done by using grid based algorithms like D-Stream [1] or density based algorithms like DBSTREAM [2] or partitioning based algorithms like k-means. The main or primary goal of this paper is to improve the quality of final clusters and to reduce the time in generating the micro-clusters. II.RELATED WORK In the application point of view one-pass clustering algorithms are not useful as the outdated data makes the cluster quality poor. CluStream is an effective and efficient method characterizes the data streams in different time horizons. The micro-clusters are stored as snapshots in pyramidal time window [5]. But cannot find arbitrary shaped clusters and cannot handle outliers. [6].Density based clustering algorithm, DBSCAN is used to find the clusters of arbitrary shapes in large spatial Databases with noise and it requires only one input parameter. It counts the number of data points and estimates its density by using eps, midpoints parameters and identifies the core, border and noise points. [3].The disadvant...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.