2009
DOI: 10.1111/j.1541-0420.2008.01064.x
|View full text |Cite
|
Sign up to set email alerts
|

Clustering in the Presence of Scatter

Abstract: A new methodology is proposed for clustering datasets in the presence of scattered observations. Scattered observations are defined as unlike any other, so traditional approaches that force them into groups can lead to erroneous conclusions. Our suggested approach is a scheme which, under assumption of homogeneous spherical clusters, iteratively builds cores around their centers and groups points within each core while identifying points outside as scatter. In the absence of scatter, the algorithm reduces to k… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
8

Relationship

3
5

Authors

Journals

citations
Cited by 20 publications
(27 citation statements)
references
References 33 publications
0
27
0
Order By: Relevance
“…The k-means algorithm does not make distributional assumptions but may be cast in a semi-parametric framework [12], [41]. We now show that the following holds even without the Gaussian distributional assumptions that underlie Result 1:…”
Section: Introductionmentioning
confidence: 70%
“…The k-means algorithm does not make distributional assumptions but may be cast in a semi-parametric framework [12], [41]. We now show that the following holds even without the Gaussian distributional assumptions that underlie Result 1:…”
Section: Introductionmentioning
confidence: 70%
“…These drawbacks have been tackled by more recent methods. Maitra and Ramler (2009), e.g., proposed a generalization of the k-means algorithm that explicitly considers scattered points. Some sophisticated grouping algorithms were proposed that only require specifying, e.g., a maximal cluster size (Scharl and Leisch, 2006), a minimal cluster size (Manley M. Schäfer et al / Computational Statistics and Data Analysis ( ) -et al, 2008, relying on point trajectories over time) or both a minimal cluster size and an effective maximal cluster radius (Ester et al, 1996;Ankerst et al, 1999).…”
Section: Introductionmentioning
confidence: 99%
“…Popular methods include hierarchical clustering [Eisen et al (1998)], K -means [Dudoit and Fridlyand (2002)], mixture model-based approaches [Xie, Pan and Shen (2008); McLachlan, Bean and Peel (2002)] and nonparametric approaches [Qin (2006)], for analysis of single transcriptomic study. Resampling and ensemble methods have been used to improve stability of the clustering analysis [Kim et al (2009); Swift et al (2004)] or to pursue tight clusters by leaving scattered samples that are different from major clusters [Tseng (2007); Tseng and Wong (2005); Maitra and Ramler (2009)]. Witten and Tibshirani (2010) proposed a sparse K -means algorithm that can effectively select gene features and perform sample clustering simultaneously.…”
Section: Introductionmentioning
confidence: 99%