2022
DOI: 10.48550/arxiv.2204.11824
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Retrieval-Augmented Diffusion Models

Abstract: Generative image synthesis with diffusion models has recently achieved excellent visual quality in several tasks such as text-based or class-conditional image synthesis. Much of this success is due to a dramatic increase in the computational capacity invested in training these models. This work presents an alternative approach: inspired by its successful application in natural language processing, we propose to complement the diffusion model with a retrieval-based approach and to introduce an explicit memory i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…In natural language processing, several works augment large language models with external data encoded with structured language and relation representations [6,23,33,39,43,53,83]. Motivated by retrieval-augmented models in NLP, several recent works leverage visual and / or textual knowledge to improve classification [44], question answering [13,47,71,77], image generation [5,14,64,90], and multi-modal tasks simultaneously [79]. RAC [44] improves long-tail classification by retrieving from a non-parametric memory consisting of pre-encoded images and text.…”
Section: Related Workmentioning
confidence: 99%
“…In natural language processing, several works augment large language models with external data encoded with structured language and relation representations [6,23,33,39,43,53,83]. Motivated by retrieval-augmented models in NLP, several recent works leverage visual and / or textual knowledge to improve classification [44], question answering [13,47,71,77], image generation [5,14,64,90], and multi-modal tasks simultaneously [79]. RAC [44] improves long-tail classification by retrieving from a non-parametric memory consisting of pre-encoded images and text.…”
Section: Related Workmentioning
confidence: 99%
“…Aiming to understand the composition concepts in scenes, Liu et al [175] propose a compositional architecture for diffusion-based image synthesis which generates a image by composing a set of diffusion models. Observing much of the success of diffusion models is due to the dramatic increase of training cost, Blattmann et al [176] propose to complement the diffusion model with a retrievalbased approach which incurs low computational cost. Discrete Diffusion.…”
Section: Conditional Diffusion Modelsmentioning
confidence: 99%
“…However, this assumption does not hold when the entity is rare, or when the desired style is greatly different from the training style. To mitigate the significant out-of-distribution performance drop, multiple works [70], [71], [72], [73] have utilized the technique of retrieval with external database as a memory. Such a technique first gained attention in NLP [74], [75], [76], [77], [78] and, more recently, in GAN-based image synthesis [79], by turning fully parametric models into semi-parametric ones.…”
Section: Retrieval For Out-of-distributionmentioning
confidence: 99%
“…Such a technique first gained attention in NLP [74], [75], [76], [77], [78] and, more recently, in GAN-based image synthesis [79], by turning fully parametric models into semi-parametric ones. Motivated by this, [70] has augmented diffusion models with retrieval. A retrieval-augmented diffusion model (RDM) [70] consists of a conditional DM and an image database which is interpreted as an explicit part of the model.…”
Section: Retrieval For Out-of-distributionmentioning
confidence: 99%
See 1 more Smart Citation