2003
DOI: 10.1198/016214503000000666
|View full text |Cite
|
Sign up to set email alerts
|

Finding the Number of Clusters in a Dataset

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
496
1
5

Year Published

2005
2005
2017
2017

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 735 publications
(506 citation statements)
references
References 3 publications
4
496
1
5
Order By: Relevance
“…As seen in Table 5, no correlation was found between dataset length and the optimum number of the clusters, which well suits the literature findings [41,42]. The optimum number of the clusters may vary based on the properties of the dataset, such as the geometric distribution, statistical measures, and neighborhood measures [42,48].…”
Section: Batch Datasetssupporting
confidence: 79%
See 1 more Smart Citation
“…As seen in Table 5, no correlation was found between dataset length and the optimum number of the clusters, which well suits the literature findings [41,42]. The optimum number of the clusters may vary based on the properties of the dataset, such as the geometric distribution, statistical measures, and neighborhood measures [42,48].…”
Section: Batch Datasetssupporting
confidence: 79%
“…This is achieved by determining the optimum cluster centers, which minimize the distortion of the samples in the clusters [41]. The process of optimizing cluster centers and members is done iteratively such that k samples are selected arbitrarily as cluster centers and the cluster centers are updated until reaching optimum centers [6,17].…”
Section: K-means Clusteringmentioning
confidence: 99%
“…The number of clusters is selected by testing an hierarchical cluster analysis using Ward's method in order to obtain a general view of the clustering step by step in the corresponding dendrogram. Also, a technique introduced by Sugar and James (2003), based on distortion, a quantity that measures the average distance between each observation and its closest cluster center, is applied confirming our selection. We note, that our purpose is to identify not only the general characteristics of the synoptic conditions favoring snowfall in Athens, but also the details referring to the position, the intensity and the trajectories of the associated circulation sys- 1957-1958 1960-1961 1963-1964 1966-1967 1969-1970 1972-1973 1975-1976 1978-1979 1981-1982 1984-1985 1987-1988 1990-1991 1993-1994 1996-1997 1999- …”
Section: Methodsmentioning
confidence: 99%
“…Moreover, after this point the curve levels off. This indicates that a further increase in the number of parameters will not significantly improve the log-likelihood (Sugar and James, 2003). We observed that, on average, the total log-likelihood of a model increases with the stability of the model.…”
Section: Families Identified Using Randomly Initialized Model (Rim)mentioning
confidence: 86%