Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long retraining and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.Project page is available at https://multidiffusion. github.io.
Project webpage: https://neural-congealing.github.io/
Input Images
Congealed Images Edited ImagesFigure 1. Given a set of input images, our method automatically detects and jointly aligns semantically-common content across the images. This is achieved through a test-time training approach that estimates a unified 2D atlas that represents the common semantic content, and dense mappings from the joint atlas to each of the input images. Our atlas and mappings are optimized per input set in a self-supervised manner by leveraging a pre-trained DINO-ViT model. Our method can be applied to diverse image sets, without requiring any additional training data, and allows us to automatically propagate an edit applied to a single image across the entire set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.