2021
DOI: 10.48550/arxiv.2107.00630
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variational Diffusion Models

Abstract: Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(19 citation statements)
references
References 17 publications
1
18
0
Order By: Relevance
“…In our preliminary experiments, we found that a naive implementation of Proposition 2 led to unstable training due to the high variance in the objective across the different timescales t. This finding is in accordance with recent work on diffusion probabilistic models (Nichol and Dhariwal, 2021;Kingma et al, 2021;Song et al, 2021a), which emphasize the critical role of applying the proper weighting function λptq rather than randomly sampling t " Ur0, 1s. This problem is also exacerbated by the fact that our training objective requires backpropagating through the score network s time θ px, tq with respect to t, which can cause training to diverge or progress extremely slowly for certain design choices.…”
Section: Variance Reduction Via Importance Weightingsupporting
confidence: 90%
See 3 more Smart Citations
“…In our preliminary experiments, we found that a naive implementation of Proposition 2 led to unstable training due to the high variance in the objective across the different timescales t. This finding is in accordance with recent work on diffusion probabilistic models (Nichol and Dhariwal, 2021;Kingma et al, 2021;Song et al, 2021a), which emphasize the critical role of applying the proper weighting function λptq rather than randomly sampling t " Ur0, 1s. This problem is also exacerbated by the fact that our training objective requires backpropagating through the score network s time θ px, tq with respect to t, which can cause training to diverge or progress extremely slowly for certain design choices.…”
Section: Variance Reduction Via Importance Weightingsupporting
confidence: 90%
“…Additionally, DRE-8 takes longer to converge as qpxq becomes further apart from ppxq, though this speaks to the challenging nature of the DRE problem as a whole. It would be interesting to investigate whether there is a time-dependent function λptq such that the time score matching loss corresponds to the maximum likelihood training of a binary classifier (Song et al, 2021a;Kingma et al, 2021). Additionally, exploring optimal integration paths between ppxq and qpxq would be exciting future work.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, with the signal-to-noise ratio implemented into the training objective in Eq. 20, we can get L 2Ns t − L Ns < 0 [48]. Therefore, large diffusion steps can help further improve the performance.…”
Section: Influence Of Hyperparametersmentioning
confidence: 95%