ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

Choi, Joon-Seok; Kim, Sung-Won; Jeong, Yonghyun; Gwon, Youngjune; Yoon, Sungroh

doi:10.1109/iccv48922.2021.01410

Cited by 494 publications

(392 citation statements)

References 29 publications

Supporting

Mentioning

392

Contrasting

Order By: Relevance

“…Diffusion models As a probabilistic generative models for unsupervised modeling (Ho et al, 2020), diffusion models have shown strong sample quality and diversity in image synthesis (Dhariwal & Nichol, 2021;Song et al, 2021a). Since then, they have been used in many image editing tasks, such as image-to-image translation (Meng et al, 2021;Choi et al, 2021;Saharia et al, 2021) and text-guided image editing (Kim & Ye, 2021;Nichol et al, 2021). Although adversarial purification can be considered as a special image editing task and particularly DiffPure shares a similar procedure with SDEdit (Meng et al, 2021), none of these works apply diffusion models to improve the model robustness.…”

Section: Related Workmentioning

confidence: 99%

Diffusion Models for Adversarial Purification

Nie¹,

Guo²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR-10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin. Project page: https://diffpure.github.io.

show abstract

Section: Related Workmentioning

confidence: 99%

Diffusion Models for Adversarial Purification

Nie¹,

Guo²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Hence, rather than starting from random Gaussian noise as in [22], one can start from xM , and use small number of iterations to achieve reconstruction, as introduced as CCDF strategy in [24]. Accordingly, both the denoising and the SR steps of R2D2+ requires few tens of iterations, as opposed to other diffusion models which require few thousand steps of iterations [16], [17], [22].…”

Section: Post-hoc Super-resolutionmentioning

confidence: 99%

“…Recently, diffusion models [16], [17] have shown impressive progress in image generation [16]- [18], outperforming even the best-in-class generative adversarial networks (GAN). While diffusion models were first developed as generative models, these are now also being adopted to inverse problems including compressed sensing MRI [19]- [21], CT reconstruction [21], super-resolution [22]- [24], and much more. Two very appealing properties of diffusion models are as follows: 1) One can acquire results from posterior sampling, rather than a single MMSE estimate.…”

Section: Introductionmentioning

confidence: 99%

MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion

Chung¹,

Lee²,

Ye³

2022

Preprint

View full text Add to dashboard Cite

Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-theart performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty.

show abstract

“…Denoising diffusion probabilistic models (diffusion models for short) have achieved the state-of-theart (SOTA) generation results in various tasks, including image [34,22,8,7,33,39,44] and super resolution image generation [13,31,41,25], text-to-image generation [23,11,14,28], text-to-speech synthesis [4,15,27,17,16,5] and speech enhancement [20,21,42]. Especially, in audio synthesis, diffusion models have shown strong ability in modelling both spectrogram features [27,17] and raw waveforms [4,15,5].…”

Section: Denoising Diffusion Probabilistic Modelsmentioning

confidence: 99%

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Leng¹,

Chen²,

Guo³

et al. 2022

Preprint

View full text Add to dashboard Cite

Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. 0.157, MOS: 3.80 vs. 3.61). The generated audio samples are available online 3 .

show abstract

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

Cited by 494 publications

References 29 publications

Diffusion Models for Adversarial Purification

Diffusion Models for Adversarial Purification

MR Image Denoising and Super-Resolution Using Regularized Reverse Diffusion

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Contact Info

Product

Resources

About