The density-based applied spatial clustering algorithm is an algorithm based on high-density interconnected regions, which discovers class clusters of arbitrary shapes in noisy data sets and is widely used. However, it suffers from slow computation speed due to large-scale disk I/O and clustering bias due to uneven density class clusters and poor parameter search ability. To address these problems, a parallel density clustering algorithm based on an improved fruit fly optimization algorithm and Spark memory iteration is proposed. The proposed algorithm first divides the data grid using an irregular dynamic density region partitioning strategy. Then, a hybrid fruit fly particle swarm algorithm based on a genetic optimization mechanism is proposed to achieve dynamic optimization seeking for parameters in local clustering to improve the clustering effect of local clustering. Finally, the local merging of samples in irregularly bounded grid cells under each partition is achieved by designing a custom clustering merging strategy. The experiments show that the improved algorithm is generally applicable to the clustering of different shape class clusters and larger scale data and has obvious improvement in accuracy and parallel efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.