2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00361
|View full text |Cite
|
Sign up to set email alerts
|

Sliced Wasserstein Distance for Learning Gaussian Mixture Models

Abstract: Gaussian mixture models (GMM) are powerful parametric tools with many applications in machine learning and computer vision. Expectation maximization (EM) is the most popular algorithm for estimating the GMM parameters. However, EM guarantees only convergence to a stationary point of the log-likelihood function, which could be arbitrarily worse than the optimal solution. Inspired by the relationship between the negative log-likelihood function and the Kullback-Leibler (KL) divergence, we propose an alternative … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
109
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 99 publications
(109 citation statements)
references
References 47 publications
0
109
0
Order By: Relevance
“…For this reason, efficient approximations and variants for it have been an active research area. In this paper, we used the Sliced Wasserstein Distance (SWD) [19], which is a good approximate of optimal transport [20] and additionally can be computed more efficiently.…”
Section: Proposed Solutionmentioning
confidence: 99%
See 1 more Smart Citation
“…For this reason, efficient approximations and variants for it have been an active research area. In this paper, we used the Sliced Wasserstein Distance (SWD) [19], which is a good approximate of optimal transport [20] and additionally can be computed more efficiently.…”
Section: Proposed Solutionmentioning
confidence: 99%
“…Second, SWD is non-zero even for two probability distributions with non-overlapping supports. As a result, it has non-vanishing gradients, and first-order gradient-based optimization algorithms can be used to solve optimization problems involving SWD terms [16,20]. This is important, as most optimization problems for training deep neural networks are solved using gradient-based methods, e.g., Stochastic Gradient Descent (SGD).…”
mentioning
confidence: 99%
“…It can be shown that training with the MLE criteria converges to a minimization of the KL-divergence as the sample size increases [23]. From Eqn (1) we see that any model admitting a differentiable density p θ (x) can be trained via backpropagation.…”
Section: A Maximum Likelihood Trainingmentioning
confidence: 99%
“…The gradients on this space are also well behavioured, which promises to achieve superior optimisation. Most recently, Kolouri et al (Kolouri, Rohde, and Hoffmann 2018) have adopted the Wasserstein distance for solving GMM problems. However, the aforementioned probability constraint enforces an extremely small stepsize (learning rate) during optimisation.…”
Section: Introductionmentioning
confidence: 99%