2010
DOI: 10.1198/jcgs.2009.08054
|View full text |Cite
|
Sign up to set email alerts
|

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

Abstract: A new method is proposed to generate sample Gaussian mixture distributions according to pre-specified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying pre-specified overlap characteristics. The algorithm is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
126
0
2

Year Published

2015
2015
2023
2023

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 132 publications
(128 citation statements)
references
References 31 publications
0
126
0
2
Order By: Relevance
“…The key result of Maitra and Melnykov (2010) is a closed expression for the probability of overlapping w j|i defined in Eq. (9), which is shown to be (for multivariate Gaussian mixtures) the cumulative distribution function (cdf) of a linear combination of non central χ 2 distributions U l with 1 degree of freedom plus a linear combination of W l ∼ N (0, 1) random variables:…”
Section: Simulating Regression Mixture Data With Mixsimregmentioning
confidence: 99%
See 2 more Smart Citations
“…The key result of Maitra and Melnykov (2010) is a closed expression for the probability of overlapping w j|i defined in Eq. (9), which is shown to be (for multivariate Gaussian mixtures) the cumulative distribution function (cdf) of a linear combination of non central χ 2 distributions U l with 1 degree of freedom plus a linear combination of W l ∼ N (0, 1) random variables:…”
Section: Simulating Regression Mixture Data With Mixsimregmentioning
confidence: 99%
“…The approach, known as MixSim (Maitra and Melnykov 2010;Melnykov et al 2012), was originally introduced in the multivariate context to generate samples from Gaussian mixture models G g=1 π g φ(y; μ g , Σ g ) defined in a v-variate space, for given data vector y, group occurrence probabilities (or mixing proportions) π g , group centroids μ g and group covariance matrices Σ g . If i and j (i = j = 1, ..., G) are clusters indexed by φ(y; μ i , Σ i ) and φ(y; μ j , Σ j ) with occurrence probabilities π i and π j , then the misclassification probability with respect to cluster i (i.e.…”
Section: Simulating Regression Mixture Data With Mixsimregmentioning
confidence: 99%
See 1 more Smart Citation
“…The overlap characteristics of mixtures obtained from the generator [22] were controlled by the two parameters: x specifying average pairwise overlap between components and x specifying maximum pairwise overlap. In the experiments, the number of components K was fixed at 20 and mixtures with dimension d 2 f2; 5; 10g were generated.…”
Section: Synthetic Datasetsmentioning
confidence: 99%
“…In the experiments with synthetic data, a generator recently proposed in [22] was employed which randomly generates Gaussian mixtures according to the user-defined overlap characteristics. The overlap x ij between two clusters i and j is defined as the sum of two misclassification probabilities x jji and x ijj where:…”
Section: Synthetic Datasetsmentioning
confidence: 99%