Abstract-This paper introduces an optimized version of the standard K-Means algorithm. The optimization refers to the running time and it comes from the observation that after a certain number of iterations, only a small part of the data elements change their cluster, so there is no need to re-distribute all data elements. Therefore the implementation proposed in this paper puts an edge between those data elements which won't change their cluster during the next iteration and those who might change it, reducing significantly the workload in case of very big data sets. The prototype implementation showed up to 70% reduction of the running time.
Abstract-In a previous paper [1] we introduced an optimized version of the K-Means Algorithm. Unlike the standard version of the K-Means algorithm that iteratively traverses the entire data set in order to decide to which cluster the data items belong, the proposed optimization relies on the observation that after performing only a few iterations the centroids get very close to their final position causing only a few of the data items to switch their cluster. Therefore, after a small number of iterations, most of the processing time is wasted on checking items that have reached their final cluster. At each iteration, the data items that might switch the cluster due to centroids' deviation will be re-checked. The prototype implementation has been evaluated using data generated based on an uniform distribution random numbers generator. The evaluation showed up to 70% reduction of the running time. This paper will evaluate the optimized KMeans against real data sets from different domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.