SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Meng, Chenlin; He, Ya‐Ling; Yang, Song; Song, Jiaming; Zhu, Jun‐Yan; Ermon, Stefano

doi:10.48550/arxiv.2108.01073

Cited by 74 publications

(112 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The generated samples are realistic and diverse, while the conditioning in the stroke paintings is faithfully preserved. Compared to Meng et al (2021b), our model enjoys a 1100× speedup in generation, as it takes only 0.16s to generate one image at 256 resolution vs. 181s for Meng et al (2021b). This experiment confirms that our proposed model enables the application of diffusion models to interactive applications such as image editing.…”

Section: Additional Studiessupporting

confidence: 67%

See 1 more Smart Citation

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Xiao¹,

Kreis²,

Vahdat³

2021

Preprint

View full text Add to dashboard Cite

A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000× faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code: https://nvlabs.github.io/denoising-diffusion-gan. * Work done during an internship at NVIDIA.

show abstract

Section: Additional Studiessupporting

confidence: 67%

“…Stroke-based image synthesis: Recently, Meng et al (2021b) propose an interesting application of diffusion models to stroke-based generation. Specifically, they perturb a stroke painting by the forward diffusion process, and denoise it with a diffusion model.…”

Section: Additional Studiesmentioning

confidence: 99%

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Xiao¹,

Kreis²,

Vahdat³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The model can even match styles when editing objects into paintings. We also experiment with SDEdit (Meng et al, 2021) in Figure 4, finding that our model is capable of turning sketches into realistic image edits. In Figure 3 we show how we can use GLIDE iteratively to produce a complex scene using a zero-shot generation followed by a series of inpainting edits.…”

Section: Qualitative Resultsmentioning

confidence: 99%

“…Most previous work that uses diffusion models for inpainting has not trained diffusion models explicitly for this task (Sohl-Dickstein et al, 2015;Meng et al, 2021). In particular, diffusion model inpainting can be performed by sampling from the diffusion model as usual, but replacing the known region of the image with a sample from q(x t |x 0 ) after each sampling step.…”

Section: Image Inpaintingmentioning

confidence: 99%

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Nichol¹,

Dhariwal²,

Ramesh³

et al. 2021

Preprint

224

292

View full text Add to dashboard Cite

Diffusion models have recently been shown to generate high-quality synthetic images, especially when paired with a guidance technique to trade off diversity for fidelity. We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. We find that the latter is preferred by human evaluators for both photorealism and caption similarity, and often produces photorealistic samples. Samples from a 3.5 billion parameter text-conditional diffusion model using classifierfree guidance are favored by human evaluators to those from DALL-E, even when the latter uses expensive CLIP reranking. Additionally, we find that our models can be fine-tuned to perform image inpainting, enabling powerful text-driven image editing. We train a smaller model on a filtered dataset and release the code and weights at https://github.com/openai/glide-text2im.

show abstract

“…Interestingly, diffusion models can go beyond unconditional image synthesis, and have been applied to conditional image generation, including super-resolution [5,17,25], inpainting [30,33], MRI reconstruction [6,13,32], image translation [5,19,27], and so on. One line of works redesigns the diffusion model specifically suitable for the task at hand, thereby achieving remarkable performance on the given task [17,25,27].…”

Section: Introductionmentioning

confidence: 99%

Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

Chung¹,

Sim²,

Ye³

2021

Preprint

View full text Add to dashboard Cite

Figure 1. Reconstruction results of three different tasks -super-resolution, inpainting, and MRI reconstruction. Numbers in parenthesis indicate the number of iterations performed for reverse diffusion. Proposed method is compared with canonical conditional diffusion models for each task. (a) Corrupted measurement, (b) ILVR [5], score-SDE [33], and score-POCS [6], respectively, for each task. (c) The proposed method.

show abstract

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Cited by 74 publications

References 49 publications

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

Contact Info

Product

Resources

About