<p>In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. The main function of OCA is composed of three phases. The first phase is the detection of the abnormal values(outliers) in the datasets using median absolute deviation. The second phase is to segment data objects into cluster using k-means algorithm. Finally, the last phase is the identification of overlapping clusters, it uses maxdist (maximum distance of data objects allowed in a cluster) as a predictor of data objects that can belong to multiple clusters. Experimental results revealed that the developed OCA proved its capability in detecting overlapping clusters and outliers accordingly.</p>
Improved multi-cluster overlapping k-means extension (IMCOKE) uses median absolute deviation (MAD) in detecting outliers in datasets makes the algorithm more effective with regards to overlapping clustering. Nevertheless, analysis of the applied MAD positioning was not considered. In this paper, the incorporation of MAD used to detect outliers in the datasets was analyzed to determine the appropriate position in identifying the outlier before applying it in the clustering application. And the assumption of the study was the size of the cluster and cluster that are close to each other can led to a higher runtime performance in terms of overlapping clusters. Therefore, additional parameters such as radius of clusters and distance between clusters are added measurements in the algorithm procedures. Evaluation was done through experimentations using synthetic and real datasets. The performance of the eHMCOKE was evaluated via F1-measure criterion, speed and percentage of improvement. Evaluation results revealed that the eHMCOKE takes less time to discover overlap clusters with an improvement rate of 22% and achieved the best performance of 91.5% accuracy rate via F1-measure in identifying overlapping clusters over the IMCOKE algorithm. These results proved that the eHMCOKE significantly outruns the IMCOKE algorithm on mosts of the test conducted.
MCOKE algorithm assigns a data object to multiple clusters and is known for its simplicity and effectiveness. Its drawback is the use of maxdist as a global threshold in assigning objects to one or more cluster because it is sensitive to outliers. Having outliers in the datasets can significantly affect the effectiveness of MCOKE with regards to overlapping clustering. In this paper, the outlier detection MAD is incorporated in MCOKE algorithm so that it can detect and remove outliers that can participate in the assignment of objects to one or more clusters. Experiments demonstrate that the improved MCOKE algorithm with MAD provides better identification of overlapping clustering. The performance of the outlier detection was also evaluated via F1 score performance criterion. Evaluation results revealed that the outlier detection demonstrated higher accuracy rate in identifying outliers when applied to real datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.