2009
DOI: 10.1007/s11063-009-9094-4
|View full text |Cite
|
Sign up to set email alerts
|

Singularity and Slow Convergence of the EM algorithm for Gaussian Mixtures

Abstract: Singularities in the parameter spaces of hierarchical learning machines are known to be a main cause of slow convergence of gradient descent learning. The EM algorithm, which is another learning algorithm giving a maximum likelihood estimator, is also suffering from its slow convergence, which often appears when the component overlap is large. We analyze the dynamics of the EM algorithm for Gaussian mixtures around singularities and show that there exists a slow manifold caused by a singular structure, which i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(22 citation statements)
references
References 20 publications
0
22
0
Order By: Relevance
“…In our experiments too, specifically in high-dimensional datasets, we notice that these singular points acts as Milnor attractors (Milnor, 1985;Park and Ozeki, 2009;Amari et al, 2006) -specifically, we find that naively running an AD-based inference leads to only one component with a mixing proportion close to 1 (we refer to this component as dominating component) and the remaining components with mixing proportions close to 0. More over, this dominating component changes with initialization of the parameters; which of these initialized components ends up as the dominating one depends on how close the parameters of a component are to one of the Milnor attractor points in the parameter space.…”
Section: A Singularities In Mixture Modelsmentioning
confidence: 62%
See 2 more Smart Citations
“…In our experiments too, specifically in high-dimensional datasets, we notice that these singular points acts as Milnor attractors (Milnor, 1985;Park and Ozeki, 2009;Amari et al, 2006) -specifically, we find that naively running an AD-based inference leads to only one component with a mixing proportion close to 1 (we refer to this component as dominating component) and the remaining components with mixing proportions close to 0. More over, this dominating component changes with initialization of the parameters; which of these initialized components ends up as the dominating one depends on how close the parameters of a component are to one of the Milnor attractor points in the parameter space.…”
Section: A Singularities In Mixture Modelsmentioning
confidence: 62%
“…Expectation Maximization. EM avoids singularities in low-dimensions when the clusters are well separated (Park and Ozeki, 2009). In high dimensions (HD), inference is difficult as the number of parameters grows quadratically with the dimension.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Also very large databases can be handled by fixing T < N (this is similar as working on a random subsample of the database). This is much faster than traditional density estimator algorithms as the Kernel estimator [12] (that also needs to keep all data in memory) or the Gaussian Mixture Model [5] estimated with the EM algorithm (as the convergence speed can become extraordinarily slow [11,9]). …”
Section: Algorithm Complexitymentioning
confidence: 99%
“…In general, GM involves the model selection, i.e., to determine the number of components in the mixture (also called model order), and the estimation of the parameters of each component in the mixture that better adjust the statistical model. Computing the parameters of Gaussian mixtures is considered a difficult optimization task, sensitive to the initialization [37] and full of possible singularities [36]. As an optimization problem, the presented here requires an objective function, which makes use of Hellinger distance to compare the GM candidate and the original histogram.…”
Section: Introductionmentioning
confidence: 99%