2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01118
|View full text |Cite
|
Sign up to set email alerts
|

Perception Prioritized Training of Diffusion Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
24
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 98 publications
(26 citation statements)
references
References 11 publications
2
24
0
Order By: Relevance
“…Fig. 1 shows the restoration results of real artifacts generated by GAN [17], GPT [8], DDPM model [3], respectively. DiffGAR exhibits better image restoration capabilities compared to other models in all three kinds of generative artifacts.…”
Section: Comparison Resultsmentioning
confidence: 99%
“…Fig. 1 shows the restoration results of real artifacts generated by GAN [17], GPT [8], DDPM model [3], respectively. DiffGAR exhibits better image restoration capabilities compared to other models in all three kinds of generative artifacts.…”
Section: Comparison Resultsmentioning
confidence: 99%
“…It has been observed in [1,5,6] that low-frequency information, i.e., coarse features such as pose and facial shape are learned in the earlier timesteps (e.g., 0 < SNR(t) < 10 −2 ) while high-frequency information such as fine-grained features and imperceptible details are encoded in later timesteps (e.g., 10 0 < SNR(t) < 10 4 ) in the reverse diffusion process. Here, SNR(t) stands for signal-to-noise ratio per timestep [26].…”
Section: Timestep Scheduling For Conditioningmentioning
confidence: 88%
“…However, their approach requires costly optimization during inference and doesn't support controlling the final generation. Inductive Bias of Diffusion Models: On top of the inductive bias [5,6] of Diffusion Models, eDiffi [1] proposed to train models specialized to a subset of the timesteps to improve generations drastically. Mag-icMix [33] interpolates noise maps while providing different embeddings at different timesteps.…”
Section: Inference Only Editingmentioning
confidence: 99%
See 1 more Smart Citation
“…To tackle this data consistency problem and enforce the model to focus more on morphological patterns, they first feed input images into a color normalization module [124] to unify the domain of all images. In addition, they apply a morphology levels prioritization module [125] that designates higher weight values to the loss at earlier levels to emphasize perceptual information and lower weights to the loss at later levels, resulting in higher Figure 13: Visual comparison of DDM [52], VM [120], and VM-Diff [122] for generating temporal cardiac images. The deformed intermediate frames S( φ) (right) are constructed using the source and target, and produced the deformation fields φ (left).…”
Section: Image Generationmentioning
confidence: 99%