2020
DOI: 10.1093/bioinformatics/btaa243
|View full text |Cite
|
Sign up to set email alerts
|

Pooled variable scaling for cluster analysis

Abstract: We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures such as the standard deviation and the range, our proposed scale avoids dampening the beneficial effect of informative clustering variables. We confirm through an extensive simulation study and applications to well known real data examples that the proposed scaling method is safe and generally useful. Finally, we use our approach to cluster a high dimensional genomic data… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(20 citation statements)
references
References 25 publications
0
20
0
Order By: Relevance
“…Table 2 also confirms what has been known since the recent introduction of scaling by 1/σ pool k , which is that proceeding in this way, once again in the case of Iris, leads to superior performance. The table goes farther than this, however, since it also makes clear that the fallback role of σ k as a surrogate for σ pool k in the approach of [5] may be taken more frequently than initially realized. This is shown in the table for all but the Iris data set.…”
Section: Discussionmentioning
confidence: 97%
“…Table 2 also confirms what has been known since the recent introduction of scaling by 1/σ pool k , which is that proceeding in this way, once again in the case of Iris, leads to superior performance. The table goes farther than this, however, since it also makes clear that the fallback role of σ k as a surrogate for σ pool k in the approach of [5] may be taken more frequently than initially realized. This is shown in the table for all but the Iris data set.…”
Section: Discussionmentioning
confidence: 97%
“…Partitioning based clustering was mainly implemented as k-means clustering [ 14 ]; however, partitioning around medoids (PAM) was used for comparison [ 18 ]. Hierarchical clustering with Ward's linkage [ 19 ] was used; however, average and complete linkage were used for comparison in analogy to the choice made in [ 6 ]. The Euclidean distance was used as the target of the transformation method presented here.…”
Section: Methodsmentioning
confidence: 99%
“…A particular problem with Euclidean distance is that it is not scale invariant, i.e., multiplying the data by a common factor changes the distance. Recognizing this potential pitfall in clustering approaches, adapted scaling methods have been proposed that take into account the scale dependence of the Euclidean distance, such as pooled variable scaling (PVS) [ 6 ]. In this report, an alternative to the standard z-transform of biomedical data is proposed as a more appropriate approach for clustering biomedical data.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations