Data clustering is used in a number of fields including statistics, bioinformatics, machine learning exploratory data analysis, image segmentation, security, medical image analysis, web handling and mathematical programming. Its role is to group data into clusters with high similarity within clusters and with high dissimilarity between clusters. This paper reviews the problems that affect clustering performance for deterministic clustering and stochastic clustering approaches. In deterministic clustering, the problems are caused by sensitivity to the number of provided clusters. In stochastic clustering, problems are caused either by the absence of an optimal number of clusters or by the projection of data. The review is focused on ant-based sorting and ACO-based clustering which have problems of slow convergence, un-robust results and local optima solution. The results from this review can be used as a guide for researchers working in the area of data clustering as it shows the strengths and weaknesses of using both clustering approaches.
Ant Colony Optimization (ACO) is a generic algorithm, which has been widely used in different application domains due to its simplicity and adaptiveness to different optimization problems. The key component that governs the search process in this algorithm is the management of its memory model. In contrast to other algorithms, ACO explicitly utilizes an adaptive memory, which is important to its performance in terms of producing optimal results. The algorithm's memory records previous search regions and is fully responsible for transferring the neighborhood of the current structures to the next iteration. Ant Colony Optimization for Clustering (ACOC) is a swarm algorithm inspired from nature to solve clustering issues as optimization problems. However, ACOC defined implicit memory (pheromone matrix) inability to retain previous information on an ant's movements in the pheromone matrix. The problem arises because ACOC is a centroid-label clustering algorithm, in which the relationship between a centroid and instance is unstable. The label of the current centroid value changes from one iteration to another because of changes in centroid label. Thus the pheromone values are lost because they are associated with the label (position) of the centroid. ACOC cannot transfer the current clustering solution to the next iterations due to the history of the search being lost during the algorithm run. This study proposes a new centroid memory (A-ACOC) for data clustering that can retain the information of a previous clustering solution. This is possible because the pheromone is associated with the adaptive instance and not with label of the centroid. Centroids will be identified based on the adaptive instance route. A comparison of the performance of several common clustering algorithms using real-world data sets shows that the accuracy of the proposed algorithm surpasses those of its counterparts.
The problem of outlier detection is one of the most important issues in the field of analysis due to its applicability in several famous problem domains, including intrusion detection, security, banks, fraud detection, and discovery of criminal activities in electronic commerce. Anomaly detection comprises two main approaches: supervised and unsupervised approach. The supervised approach requires pre-defined information, which is defined as the type of outliers, and is difficult to be defined in some applications. Meanwhile, the second approach determines the outliers without human interaction. A review of the unsupervised approach, which shows the main advantages and the limitations considering the studies performed in the supervised approach, is introduced in this paper. This study indicated that the unsupervised approach suffers from determining local and global outlier objects simultaneously as the main problem related to algorithm parameterization. Moreover, most algorithms do not rank or identify the degree of being an outlier or normal objects and required different parameter settings by the research. Examples of such parameters are the radius of neighborhood, number of neighbors within the radius, and number of clusters. A comprehensive and structured overview of a large set of interesting outlier algorithms, which emphasized the outlier detection limitation in the unsupervised approach, can be used as a guideline for researchers who are interested in this field.
Ant colony optimization is a meta-heuristic algorithm inspired by the foraging behavior of real ant colony. The algorithm is a population-based solution employed in different optimization problems such as classification, image processing, clustering, and so on. This paper sheds the light on the side of improving the results of traveling salesman problem produced by the algorithm. The key success that produces the valuable results is due to the two important components of exploration and exploitation. Balancing both components is the foundation of controlling search within the ACO. This paper proposes to modify the main probabilistic method to overcome the drawbacks of the exploration problem and produces global optimal results in high dimensional space. Experiments on six variant of ant colony optimization indicate that the proposed work produces high-quality results in terms of shortest route
A fundamental problem in data clustering is how to determine the correct number of clusters. The k-adaptive medoid set ant colony optimization (ACO) clustering (METACOC-K) algorithm is superior in solving clustering problems. However, METACOC-K does not guarantee in finding the best number of clusters. It assumed the number of clusters based on an adaptive parameter strategy that lacks feedback learning. This has restrained the algorithm in producing compact clusters and the optimal number of clusters. In this paper, a self-adaptive ACO clustering (S-ACOC) algorithm is proposed to produce the optimal number of clusters by incorporating a self-adaptive parameter strategy. The S-ACOC algorithm is a centroid-based algorithm that automatically adjusts the number of clusters during the algorithm run. The selection of the number of clusters is based on a construction graph that reflects the influence of a pheromone in algorithm learning. Experiments were conducted on real-world datasets to evaluate the performance of the proposed algorithm. The external evaluation metrics (purity, F-measure, and entropy) were used to compare the results of the proposed algorithm with other swarm clustering algorithms, including a genetic algorithm (GA), particle swarm optimization (PSO), and METACOC-K. Results showed that S-ACOC provides higher purity (50%) and lower entropy (40%) than GA, PSO, and METACOC-K. Experiments were also performed on several predefined clusters, and results demonstrate that the S-ACOC algorithm is superior to GA, PSO, and METACOC-K. Based on the superior performance, S-ACOC can be used to solve clustering problems in various application domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.