2021
DOI: 10.48550/arxiv.2111.12701
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Abstract: Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 37 publications
0
1
0
Order By: Relevance
“…While diffusion models have shown impressive results on generation, editing, and other tasks (see Section 2), their main drawback is their long inference times, due to the iterative diffusion process that is applied at the pixel level to generate each result. Some recent works Gu et al 2021;Esser et al 2021b;Bond-Taylor et al 2021;] have thus proposed to perform the diffusion on a latent space with lower dimensionality and higher-level semantics, compared to pixels, yielding competitive performance on various tasks with much lower training and inference times. In particular, Latent Diffusion Models (LDM) ] offer this appealing combination of competitive image quality with fast inference times, however, this approach targets text-to-image generation from scratch, rather than global image manipulation, let alone local editing.…”
Section: Introductionmentioning
confidence: 99%
“…While diffusion models have shown impressive results on generation, editing, and other tasks (see Section 2), their main drawback is their long inference times, due to the iterative diffusion process that is applied at the pixel level to generate each result. Some recent works Gu et al 2021;Esser et al 2021b;Bond-Taylor et al 2021;] have thus proposed to perform the diffusion on a latent space with lower dimensionality and higher-level semantics, compared to pixels, yielding competitive performance on various tasks with much lower training and inference times. In particular, Latent Diffusion Models (LDM) ] offer this appealing combination of competitive image quality with fast inference times, however, this approach targets text-to-image generation from scratch, rather than global image manipulation, let alone local editing.…”
Section: Introductionmentioning
confidence: 99%