2003
DOI: 10.21236/ada459638
|View full text |Cite
|
Sign up to set email alerts
|

Model-Based Clustering for Image Segmentation and Large Datasets Via Sampling

Abstract: Abstract:The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more difficult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2005
2005
2019
2019

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 7 publications
0
18
0
Order By: Relevance
“…In a basic sampling approach, a random sample of the data is used to calculate the clusters and then an additional "E" (expectation) step is used to classify the remaining items. This approach can be improved by building multiple models from the initial sample and then running through several steps of the EM algorithm to fit the whole dataset to these models (Wehrens et al, 2004) or by looking to create new clusters for observations in the full dataset that are fit badly by the sample clusters (Fraley et al, 2005). In addition, as per other clustering approaches, parallel methods have been developed (Kriegel et al, 2005;McNicholas et al, 2010).…”
Section: Modern Large Scale Segmentation Approachesmentioning
confidence: 99%
“…In a basic sampling approach, a random sample of the data is used to calculate the clusters and then an additional "E" (expectation) step is used to classify the remaining items. This approach can be improved by building multiple models from the initial sample and then running through several steps of the EM algorithm to fit the whole dataset to these models (Wehrens et al, 2004) or by looking to create new clusters for observations in the full dataset that are fit badly by the sample clusters (Fraley et al, 2005). In addition, as per other clustering approaches, parallel methods have been developed (Kriegel et al, 2005;McNicholas et al, 2010).…”
Section: Modern Large Scale Segmentation Approachesmentioning
confidence: 99%
“…The performance on noisy data has been demonstrated solely in [40] with data containing only 5% of noise, while we show results for varying noise proportions up to 90%. Third, as most clustering algorithms including SPC have time complexity of O(n 2 ), the subsample size we consider here is O( √ n), which is much smaller than the subsample sizes used in the previous work [2,11,40,12,21,26,30]. This is important in the context of big data applications and inherently large datasets, for which only algorithms with O(n) operations would be computationally feasible.…”
Section: Introductionmentioning
confidence: 96%
“…Later, Fraley and Raftery [11] elaborate on subsample clustering and discriminant analysis for large data and discuss a modification of the simple random subsampling with the goal of finding small, tight clusters. A number of other clustering methods were subsequently developed, following a similar idea [25,40,12,21,26,30]. All of these methods are geared mainly towards computational efficiency, and several were also developed to find small clusters in large datasets [9,25,12,26,30].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Model-based clustering is an idiom that is often used to describe the application of a mixture model for clustering. Dating at least as far back as Wolfe (1963), interest in model-based clustering is increasing steadily in application areas such as food authenticity, social networks, and microarray gene expression analyses (e.g., Yeung et al, 2001;Wehrens et al, 2004;Krivitsky et al, 2009;McNicholas and Murphy, 2010). In model-based clustering applications, it is common to fit many mixture models within a family (cf.…”
Section: Introductionmentioning
confidence: 99%