2022
DOI: 10.21203/rs.3.rs-2388679/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Geometry-Inference based Clustering-Heuristic: An empirical method for kmeans optimal clusters determination

Abstract: Kmeans is one of the most algorithms that are utilized in data clustering. Number of metrics is coupled with kmeans in order cluster data targeting the enhancement of both locally clusters compactness and the globally clusters separation. Then, before the ultimate data assignment to their corresponding clusters, the selection of the optimal number of clusters should constitute a crucial step in the clustering process. The present work aims to build up a new clustering metric/heuristic that takes into account b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 30 publications
1
5
0
Order By: Relevance
“…Similar studies were performed regarding the simple-effect of the normalization by number of researches such as [27] and [28], and other were dedicated to the simple effect of the clusters' shapes such as Kłopotek et al 2020 which proved kmeans clusters should be hyper(ball)-shaped ones to converge to the global optimum [25], similar statement was proposed by Qiu 2010 [29]. The results of Kłopotek et al 2020 [25] are in total concordance with those of El Khattabi et al 2022 [26] since it was found that normal (Gaussian) standardization are well adapted to Gaussian data-shapes, named in the EL Khattabi's paper as Likely-Gaussian datasets, and in Kłopotek's paper as hyper(ball)-shaped data. Similarly, Hennig 2022 studied nine clustering methods by means of several cluster validation indexes [30], The author measured various individual aspects of the data sets such the scales of data, the clusters separation criterion, and the datasets shapes as mainly the closeness to spatial Gaussian distribution, and so forth.…”
Section: Introductionsupporting
confidence: 65%
See 4 more Smart Citations
“…Similar studies were performed regarding the simple-effect of the normalization by number of researches such as [27] and [28], and other were dedicated to the simple effect of the clusters' shapes such as Kłopotek et al 2020 which proved kmeans clusters should be hyper(ball)-shaped ones to converge to the global optimum [25], similar statement was proposed by Qiu 2010 [29]. The results of Kłopotek et al 2020 [25] are in total concordance with those of El Khattabi et al 2022 [26] since it was found that normal (Gaussian) standardization are well adapted to Gaussian data-shapes, named in the EL Khattabi's paper as Likely-Gaussian datasets, and in Kłopotek's paper as hyper(ball)-shaped data. Similarly, Hennig 2022 studied nine clustering methods by means of several cluster validation indexes [30], The author measured various individual aspects of the data sets such the scales of data, the clusters separation criterion, and the datasets shapes as mainly the closeness to spatial Gaussian distribution, and so forth.…”
Section: Introductionsupporting
confidence: 65%
“…In a previous work, the authors of the present paper experimentally proved the importance of data preparation in terms of normalization, and the importance of the data dispersion which was qualified as space data shape, then, these two characteristics were combined with different kmeans metrics for a series of datasets. The findings clearly showed the tri-fold interplay between the latter parameters but also the important sensitivity of these latter on the clustering results [26]. Similar studies were performed regarding the simple-effect of the normalization by number of researches such as [27] and [28], and other were dedicated to the simple effect of the clusters' shapes such as Kłopotek et al 2020 which proved kmeans clusters should be hyper(ball)-shaped ones to converge to the global optimum [25], similar statement was proposed by Qiu 2010 [29].…”
Section: Introductionmentioning
confidence: 88%
See 3 more Smart Citations