“…While diffusion models have shown impressive results on generation, editing, and other tasks (see Section 2), their main drawback is their long inference times, due to the iterative diffusion process that is applied at the pixel level to generate each result. Some recent works Gu et al 2021;Esser et al 2021b;Bond-Taylor et al 2021;] have thus proposed to perform the diffusion on a latent space with lower dimensionality and higher-level semantics, compared to pixels, yielding competitive performance on various tasks with much lower training and inference times. In particular, Latent Diffusion Models (LDM) ] offer this appealing combination of competitive image quality with fast inference times, however, this approach targets text-to-image generation from scratch, rather than global image manipulation, let alone local editing.…”