AbstractSince the last decade, the collective intelligent behavior of groups of animals, birds or insects have attracted the attention of researchers. Swarm intelligence is the branch of artificial intelligence that deals with the implementation of intelligent systems by taking inspiration from the collective behavior of social insects and other societies of animals. Many meta-heuristic algorithms based on aggregative conduct of swarms through complex interactions with no supervision have been used to solve complex optimization problems. Data clustering organizes data into groups called clusters, such that each cluster has similar data. It also produces clusters that could be disjoint. Accuracy and efficiency are the important measures in data clustering. Several recent studies describe bio-inspired systems as information processing systems capable of some cognitive ability. However, existing popular bio-inspired algorithms for data clustering ignored good balance between exploration and exploitation for producing better clustering results. In this article, we propose a bio-inspired algorithm, namely social spider optimization (SSO), for clustering that maintains a good balance between exploration and exploitation using female and male spiders, respectively. We compare results of the proposed algorithm SSO with K means and other nature-inspired algorithms such as particle swarm optimization (PSO), ant colony optimization (ACO) and improved bee colony optimization (IBCO). We find it to be more robust as it produces better clustering results. Although SSO solves the problem of getting stuck in the local optimum, it needs to be modified for locating the best solution in the proximity of the generated global solution. Hence, we hybridize SSO with K means, which produces good results in local searches. We compare proposed hybrid algorithms SSO+K means (SSOKC), integrated SSOKC (ISSOKC), and interleaved SSOKC (ILSSOKC) with K means+PSO (KPSO), K means+genetic algorithm (KGA), K means+artificial bee colony (KABC) and interleaved K means+IBCO (IKIBCO) and find better clustering results. We use sum of intra-cluster distances (SICD), average cosine similarity, accuracy and inter-cluster distance to measure and validate the performance and efficiency of the proposed clustering techniques.
Nature-inspired algorithms are based on the concepts of self-organization and complex biological systems. They have been designed by researchers and scientists to solve complex problems in various environmental situations by observing how naturally occurring phenomena behave. The introduction of nature-inspired algorithms has led to new branches of study such as neural networks, swarm intelligence, evolutionary computation, and artificial immune systems. Particle swarm optimization (PSO), social spider optimization (SSO), and other nature-inspired algorithms have found some success in solving clustering problems but they may converge to local optima due to the lack of balance between exploration and exploitation. In this paper, we propose a novel implementation of SSO, namely social spider optimization for data clustering using single centroid representation and enhanced mating operation (SSODCSC) in order to improve the balance between exploration and exploitation. In SSODCSC, we implemented each spider as a collection of a centroid and the data instances close to it. We allowed non-dominant male spiders to mate with female spiders by converting them into dominant males. We found that SSODCSC produces better values for the sum of intra-cluster distances, the average CPU time per iteration (in seconds), accuracy, the F-measure, and the average silhouette coefficient as compared with the K-means and other nature-inspired techniques. When the proposed algorithm is compared with other nature-inspired algorithms with respect to Patent corpus datasets, the overall percentage increase in the accuracy is approximately 13%. When it is compared with other nature-inspired algorithms with respect to UCI datasets, the overall percentage increase in the F-measure value is approximately 10%. For completeness, the best K cluster centroids (the best K spiders) returned by SSODCSC were specified. To show the significance of the proposed algorithm, we conducted a one-way ANOVA test on the accuracy values and the F-measure values returned by the clustering algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.