1990
DOI: 10.1016/0169-7439(90)80135-s
|View full text |Cite
|
Sign up to set email alerts
|

The use of sampling to cluster large data sets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

1993
1993
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 9 publications
1
4
0
Order By: Relevance
“…Indeed, for S values that correspond to computation time approximately equal to the run time of GMCA, the latter still performs better on all data sets. On the other hand, large n values, which lead to a quadratic increase in the computation time, generally improve the results, which agrees with findings of (Hopke and Kaufman, 1990). For n values that correspond to computation time of the GMCA, the CLARA can perform as good as the GMCA on the Simulate15 and Yeast384 data sets, however, it still performs worse than the GMCA for the data sets of Simulate30 and Yeast2945, which have comparatively large search space.…”
Section: Further Comparisonssupporting
confidence: 89%
“…Indeed, for S values that correspond to computation time approximately equal to the run time of GMCA, the latter still performs better on all data sets. On the other hand, large n values, which lead to a quadratic increase in the computation time, generally improve the results, which agrees with findings of (Hopke and Kaufman, 1990). For n values that correspond to computation time of the GMCA, the CLARA can perform as good as the GMCA on the Simulate15 and Yeast384 data sets, however, it still performs worse than the GMCA for the data sets of Simulate30 and Yeast2945, which have comparatively large search space.…”
Section: Further Comparisonssupporting
confidence: 89%
“…While also here a range of quality measures exist, i.e., the Dunn index [15] or the Calinski-Harabasz score [10], we chose silhouette coefficient [24], because it is represents an established and intuitive measure for both cohesion and separation of clusters. In our evaluation section (Section V-A) we present a comparative evaluation of the quality computations based on the Dunn index and the Calinski-Harabasz score.…”
Section: Prototypical Implementationmentioning
confidence: 99%
“…The sample size preferred for Hierarchical Cluster Analysis is not more than 200 samples [3]. Reference [4] mentioned large data sets can be problems with Agglomerative Hierarchical Cluster Analysis. An alternative to Agglomerative Hierarchical Cluster Analysis for more than 200 data is given by various forms of nonhierarchical Cluster Analysis [4].…”
Section: Research Site and Instrumentmentioning
confidence: 99%