2021
DOI: 10.1038/s41598-021-98126-1
|View full text |Cite|
|
Sign up to set email alerts
|

Distance-based clustering challenges for unbiased benchmarking studies

Abstract: Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures. Clustering yields arbitrary labels and often depends on the trial, leading to varying results. Moreover, recent research indicated that all partition comparison me… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
27
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 18 publications
(28 citation statements)
references
References 69 publications
1
27
0
Order By: Relevance
“…In 2010, the Ministry of Education put forward new requirements for it, requiring that more attention should be paid to it in colleges, and the education target has also changed from the original entrepreneurs to all college students. To the whole process of college students' talent training, at the same time, the core of it is to improve college students' awareness of it and it [1]. In 2016, the Ministry of Education once again focused on this, emphasizing the reform content of it, and should continuously enhance the spirit of independent innovation, entrepreneurial awareness and innovation, and entrepreneurship ability of students [2].…”
Section: Introductionmentioning
confidence: 99%
“…In 2010, the Ministry of Education put forward new requirements for it, requiring that more attention should be paid to it in colleges, and the education target has also changed from the original entrepreneurs to all college students. To the whole process of college students' talent training, at the same time, the core of it is to improve college students' awareness of it and it [1]. In 2016, the Ministry of Education once again focused on this, emphasizing the reform content of it, and should continuously enhance the spirit of independent innovation, entrepreneurial awareness and innovation, and entrepreneurship ability of students [2].…”
Section: Introductionmentioning
confidence: 99%
“…Descriptions and access to typical density-and distance-based structures [31] and algorithms [32] is provided. In the subsequent work [33], the pitfalls and challenges of automated cluster detection or cluster analysis pipelines are highlighted. This work shows that…”
Section: Human-in-the-loop Projection-based Clustering (Hil-pbc)mentioning
confidence: 99%
“…• Parameter optimization on datasets without distancebased structures, • Algorithm selection using unsupervised quality measures on biomedical data, and • Benchmarking detection algorithms with first-order statistics or box plots or a small number of repetitions of identical algorithm calls are biased and often not recommended [33]. This serves as a motivation to investigate HIL approaches for structure identification toward pattern recognition as opposed to automatic algorithmic detection [22].…”
Section: Human-in-the-loop Projection-based Clustering (Hil-pbc)mentioning
confidence: 99%
“… Researchers can use cases of each sample to benchmark unsupervised machine learning methods (c.f. [9] ) because domain experts (i.e., clinicians) distinguish pB samples from BM samples, and leukemia BM samples versus non-leukemia BM samples based on distributions of biological cell populations by looking at two-dimensional scatter plots. As a consequence, clear, straightforward patterns in data that have a biological meaning are visible to the human eye (c.f.…”
Section: Value Of the Datamentioning
confidence: 99%