Distributed clustering of categorical data using the information bottleneck framework

Tagasovska, Natasa; Andritsos, Periklis

doi:10.1016/j.is.2017.10.006

Cited by 5 publications

(2 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the description of the MST-DC clustering algorithm in the previous sections, the time complexity of MST-DC depends on following parts: (1) we use the natural neighbor algorithm optimized by KD − tree [30] to obtain the reverse nearest neighbors of each data point, the natural eigenvalue, and the Euclidean distance of the data points, and its time complexity is O(n log(n)); (2) the process of extracting core points is equivalent to traversing data points, and its time complexity is O(n); (3) the time complexity of clustering the core points based on the minimum spanning tree is mainly focused on the Prim algorithm to establish the minimum spanning tree. is (5) ifvalue(e) > cutθthen (6) cut this edge; (7) end if (8) end for (9) for each object p in RCoredo (10) ifCL(p) �� 0then (11) ClusterID � max (CL) + 1; (12) TreeCP � q | (p, g) ∈ TreeE dg e 􏼈 􏼉; (13) CL(p) � ClusterID; (14) while∃x ∈ TreeCP && CL(x) �� 0do ( 15) 17) end while (18) end if (19) end for (20) ReturnCL 6…”

Section: E Complexity Analysismentioning

confidence: 99%

“…These algorithms can be roughly classified into four categories: partition-based clustering algorithms [ 4 , 5 ], hierarchical clustering algorithms [ 6 , 7 ], density-based clustering algorithms [ 8 , 9 ], and graph-based clustering algorithms [ 10 – 12 ]. Thanks to the predominant capability of discovering clusters of different shapes and sizes along with outliers, density-based and partition-based clustering technologies are widely used in the fields of health care [ 13 ], information security [ 14 ], the Internet [ 15 ], etc. Besides, clustering is also a vital key for analyzing big data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Minimum Spanning Tree Clustering Algorithm Based on Density Core

Gao

Xiong

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Clustering analysis is an unsupervised learning method, which has applications across many fields such as pattern recognition, machine learning, information security, and image segmentation. The density-based method, as one of the various clustering algorithms, has achieved good performance. However, it works poor in dealing with multidensity and complex-shaped datasets. Moreover, the result of this method depends heavily on the parameters we input. Thus, we propose a novel clustering algorithm (called the MST-DC) in this paper, which is based on the density core. Firstly, we employ the reverse nearest neighbors to extract core objects. Secondly, we use the minimum spanning tree algorithm to cluster the core objects. Finally, the remaining objects are assigned to the cluster to which their nearest core object belongs. The experimental results on several synthetic and real-world datasets show the superiority of the MST-DC to Kmeans, DBSCAN, DPC, DCore, SNNDPC, and LDP-MST.

show abstract