Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002
DOI: 10.1145/775047.775074
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical model-based clustering of large datasets through fractionation and refractionation

Abstract: The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the a~vantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is therefore not applicable to large problems. We review an idea called Fractionation, originally conceived by Cutting, Karger, Pedersen and Tukey for non-param… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2003
2003
2006
2006

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Posse (2001) proposed a method based on the minimum spanning tree for obtaining the initial partition. A different approach called fractionation was proposed by Tantrum, Murua, and Stuetzle (2002) in the context of hierarchical model-based clustering, where the complete data set is split up randomly into smaller sets, which are clustered individually. The process is then repeated, but with the smaller sets formed by aggregated clusters rather than randomly.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Posse (2001) proposed a method based on the minimum spanning tree for obtaining the initial partition. A different approach called fractionation was proposed by Tantrum, Murua, and Stuetzle (2002) in the context of hierarchical model-based clustering, where the complete data set is split up randomly into smaller sets, which are clustered individually. The process is then repeated, but with the smaller sets formed by aggregated clusters rather than randomly.…”
Section: Discussionmentioning
confidence: 99%
“…Various approaches to the problem of clustering large data sets have been proposed, including initialization by clustering a sample of the data (Banfield and Raftery 1993;Fayyad and Smyth 1996;Maitra 2001), and using an initial crude partitioning of the entire data set (Posse 2001;Tantrum, Murua, and Stuetzle 2002). The simplest and perhaps most widely applied approach is to apply the clustering method first to a small simple random sample from the data, and then apply the resulting estimated model to the full data set using discriminant analysis (Banfield and Raftery 1993).…”
Section: Introductionmentioning
confidence: 99%
“…Also, there are no methods for choosing the number of clusters in model-based hierarchical clustering that are comparable to those available for mixture models. However, this approach could be combined with the mixture modeling approach as a starting scheme for a strategy similar to Strategy W. Tantrum et al (2002) extended model-based hierarchical clustering to large data sets through "refractionation", which splits the data up into many subsets or "fractions". The fractions are clustered by model-based hierarchical clustering into a fixed number of groups and then summarized by their means into meta-observations.…”
Section: Discussionmentioning
confidence: 99%
“…There have been several recent advances in extending the normal mixture model to large datasets [2,12].…”
Section: Introductionmentioning
confidence: 99%