Object-Centric Slot Diffusion

Jiang, Jindong; Deng, Feiqi; Singh, Gautam; Ahn, Sungjin

doi:10.48550/arxiv.2303.10834

Cited by 1 publication

(1 citation statement)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Text-to-image diffusion models Diffusion models [10,19,21,41,[58][59][60][61][62] have proven to be highly effective in learning data distributions and have shown impressive results in image synthesis, leading to various applications [8,26,27,29,31,32,36,46,56,74]. Recent advancements have also explored transformer-based architectures [6,45,67].…”

Section: Related Workmentioning

confidence: 99%

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

Han¹,

Li²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of highquality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing textto-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size (1.7MB for StableDiffusion) compared to existing methods (vanilla DreamBooth 3.66GB, Custom Diffusion 73MB), making it more practical for real-world applications.

show abstract