2022
DOI: 10.1063/5.0078793
|View full text |Cite
|
Sign up to set email alerts
|

Elucidating the solution structure of the K-means cost function using energy landscape theory

Abstract: The K-means algorithm, routinely used in many scientific fields, generates clustering solutions that depend on the initial cluster coordinates. The number of solutions may be large, which can make locating the global minimum challenging. Hence, the topography of the cost function surface is crucial to understanding the performance of the algorithm. Here, we employ the energy landscape approach to elucidate the topography of the K-means cost function surface for Fisher’s Iris dataset. For any number of clusters… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 72 publications
0
8
0
Order By: Relevance
“…Importantly, since a loss function is ubiquitous in any machine learning system, consideration of the LL is applicable in a broad range of elds. While most work has so far considered neural networks, 20,31 LLs have also been analysed in the context of clustering methods, such as K-means 32 or for Gaussian processes 33 in Bayesian machine learning. 34 These contributions suggest another useful property of the energy landscapes view of machine learning.…”
Section: Motivation: El4mlmentioning
confidence: 99%
See 2 more Smart Citations
“…Importantly, since a loss function is ubiquitous in any machine learning system, consideration of the LL is applicable in a broad range of elds. While most work has so far considered neural networks, 20,31 LLs have also been analysed in the context of clustering methods, such as K-means 32 or for Gaussian processes 33 in Bayesian machine learning. 34 These contributions suggest another useful property of the energy landscapes view of machine learning.…”
Section: Motivation: El4mlmentioning
confidence: 99%
“…[67][68][69] These methods require continuous rst and second derivatives, but even for loss functions without these properties, such as Kmeans, landscapes can still be explored using algorithmic adaptations. 32 These geometry optimisation tools have been rened for a wide range of problems over several decades, and are implemented in the GMIN, 70 OPTIM 71 and PATHSAMPLE 72 programs, 73 all available for use under the GNU General Public License.…”
Section: Ll Explorationmentioning
confidence: 99%
See 1 more Smart Citation
“…Direct exploration and analysis of topography, however, is routinely performed in the chemical physics community, in particular for the characterization of potential energy landscapes. 20 Recently, this methodology has been extended by some of the authors to selected tasks in machine learning such as clustering, 21 and hyperparameter tuning in Gaussian processes, 22 for which we point interested readers to a recent tutorial review. 23 In this contribution, we develop a novel roughness measure inspired by the similarities between model response surfaces and energy landscapes.…”
Section: Introductionmentioning
confidence: 99%
“…Inclusion of E avg then helps to locate the minima on this hyperline that correspond to the MECPs . A key feature of this procedure, and the penalty functions in particular, is their ability to locate transition state analogues, i.e., the MECP, involving discontinuous first and second derivatives in the adiabatic framework . In eq , a total of three empirical parameters, σ, k , and α, are chosen to converge the energy gap to zero near the seam region while providing a linearly growing energy bias far away from it. , Detailed adjustment of the empirical parameters in the penalty potential may be required for some systems, but manually selected values of σ = 10.0, α = 0.005, and k = 0.25 often work well and were used throughout this study.…”
mentioning
confidence: 99%