Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering

Rossbroich, Julian; Durieux, Jeffrey; Wilderjans, Tom F.

doi:10.1007/s00357-021-09409-1

Cited by 4 publications

(3 citation statements)

References 107 publications

(155 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rossbroich et al (2022) addressed the (underinvestigated) issue of selecting the number of clusters in an overlapping clustering model (ADPROCLUS: Depril et al, 2008; Mirkin, 1987). For this purpose, they proposed and compared 13 model selection strategies, 11 of which were (minor or major) adaptations of similar strategies for partitioning models, and 2 of which were crossvalidation‐based.…”

Section: A Few Illustrative Examplesmentioning

confidence: 99%

“…For the actual simulations, either own code or existing data generators can be used. Regarding the latter, over the past decades quite a few generators have been proposed, including the Milligan (1985) algorithm for generating artificial test clusters, OCLUS (Steinley & Henson, 2005), the Qiu and Joe (2006) random cluster generation algorithm, and MixSim (Melnykov et al, 2012). In the justification of these generators, quite some emphasis has been put on the aspects of separability and overlap, where overlap refers to intensional rather than to extensional overlap, that is to say, overlap in terms of variables or component distributions, with all generated clusterings being partitions.…”

Section: Issuesmentioning

confidence: 99%

“…As a way out, one necessarily had to turn to benchmarking studies, with instances of seminal benchmarking attempts being reported by, for example, Baker (1974), Hubert (1974), and especially Milligan (Milligan, 1980, 1985; Milligan et al, 1983; Milligan & Cooper, 1985; for overviews of earlier benchmarking work in the area, see, e.g., Jain & Dubes, 1988; Milligan, 1981a, 1981b). Later on, there has been some follow‐up to this seminal work (e.g., Anderlucci & Hennig, 2014; Arbelaitz et al, 2013; Costa et al, 2022; Hennig, 2022; Rossbroich et al, 2022; Schepers et al, 2006; Shireman et al, 2017; Steinley, 2003; Steinley & Brusco, 2008; Šulc & Řezanková, 2019; Wilderjans et al, 2013). Nevertheless, there is much less of a benchmarking tradition in the clustering area than in the field of supervised learning.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A white paper on good research practices in benchmarking: The case of cluster analysis

Mechelen

Boulesteix²,

Dangl

et al. 2023

WIREs Data Min & Knowl

View full text Add to dashboard Cite

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance, requiring that proposals of new methods are extensively and carefully compared with their best predecessors, and existing methods subjected to neutral comparison studies. Answers to benchmarking questions should be evidence‐based, with the relevant evidence being collected through well‐thought‐out procedures, in reproducible and replicable ways. In the present paper, we review good research practices in benchmarking from the perspective of the area of cluster analysis. Discussion is given to the theoretical, conceptual underpinnings of benchmarking based on simulated and empirical data in this context. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made based on existing literature.This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Technologies > Structure Discovery and Clustering

show abstract

Section: A Few Illustrative Examplesmentioning

confidence: 99%

Section: Issuesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A white paper on good research practices in benchmarking: The case of cluster analysis

Mechelen

Boulesteix²,

Dangl

et al. 2023

WIREs Data Min & Knowl

View full text Add to dashboard Cite

show abstract

Optimal Band Selection Using Evolutionary Machine Learning to Improve the Accuracy of Hyper-spectral Images Classification: a Novel Migration-Based Particle Swarm Optimization

Vahidi,

Aghakhani,

Martín

et al. 2023

J Classif

View full text Add to dashboard Cite

Cluster Validation Based on Fisher’s Linear Discriminant Analysis

Kächele,

Schneider

2024

J Classif

View full text Add to dashboard Cite

Cluster analysis aims to find meaningful groups, called clusters, in data. The objects within a cluster should be similar to each other and dissimilar to objects from other clusters. The fundamental question arising is whether found clusters are “valid clusters” or not. Existing cluster validity indices are computation-intensive, make assumptions about the underlying cluster structure, or cannot detect the absence of clusters. Thus, we present a new cluster validation framework to assess the validity of a clustering and determine the underlying number of clusters $$k^*$$ k ∗ . Within the framework, we introduce a new merge criterion analyzing the data in a one-dimensional projection, which maximizes the ratio of between-cluster- variance to within-cluster-variance in the clusters. Nonetheless, other local methods can be applied as a merge criterion within the framework. Experiments on synthetic and real-world data sets show promising results for both the overall framework and the introduced merge criterion.

show abstract

Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering

Cited by 4 publications

References 107 publications

A white paper on good research practices in benchmarking: The case of cluster analysis

A white paper on good research practices in benchmarking: The case of cluster analysis

Optimal Band Selection Using Evolutionary Machine Learning to Improve the Accuracy of Hyper-spectral Images Classification: a Novel Migration-Based Particle Swarm Optimization

Cluster Validation Based on Fisher’s Linear Discriminant Analysis

Contact Info

Product

Resources

About