SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Ntavelis, Evangelos; Romero, Andrés; Kastanis, Iason; Gool, Luc Van; Timofte, Radu

doi:10.1007/978-3-030-58542-6_24

Cited by 68 publications

(52 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As image inpainting requires a high-level semantic context, and to explicitly include it in the generation pipeline, there exist hand-crafted architectural designs such as Dilated Convolutions [13,38] to increase the receptive field, Partial Convolutions [16] and Gated Convolutions [41] to guide the convolution kernel according to the inpainted mask, Contextual Attention [39] to leverage on global information, Edges maps [7,22,36,37] or Semantic Segmentation maps [11,25] to further guide the generation, and Fourier Convolutions [32] to include both global and local information efficiently. Although recent works produce photo-realistic results, GANs are well known for textural synthesis, so these methods shine on background completion or removing objects, which require repetitive structural synthesis, and struggle with semantic synthesis (See Figure 5).…”

Section: Related Workmentioning

confidence: 99%

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Lugmayr¹,

Danelljan²,

Romero³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Figure 1. We use Denoising Diffusion Probabilistic Models (DDPM) for inpainting. The process is conditioned on the masked input (left). It starts from a random Gaussian noise sample that is iteratively denoised until it produces a high-quality output. Since this process is stochastic, we can sample multiple diverse outputs. The DDPM prior forces a harmonized image, is able to reproduce texture from other regions, and inpaint semantically meaningful content.

show abstract

Section: Related Workmentioning

confidence: 99%

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Lugmayr¹,

Danelljan²,

Romero³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The arise of generative adversarial network (GANs) brings revolutionary advance to image editing [4], [20], [52], [55], [67], [68], [94]. As one of the most intuitive representation in image editing, semantic information has been extensively investigated in conditional image synthesis.…”

Section: Semantic Image Editingmentioning

confidence: 99%

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

Zhan¹,

Yu²,

Wu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level feature alignment strategy that adopts a top-k operation to rank block-wise features followed by dense attention between block features which reduces memory cost substantially. As the top-k operation involves index swapping which precludes the gradient propagation, we propose to approximate the non-differentiable top-k operation with a regularized earth mover's problem so that its gradient can be effectively back-propagated. In addition, we design a novel semantic position encoding mechanism that builds up coordinate for each individual semantic region to preserve texture structures while building correspondences. Further, we design a novel confidence feature injection module which mitigates mismatch problem by fusing features adaptively according to the reliability of built correspondences. Extensive experiments show that our method achieves superior performance qualitatively and quantitatively as compared with the state-of-the-art. The code is available at https://github.com/fnzhan/RABIT.

show abstract

“…Conditional normalization is the workhorse of many methods solving diverse problems in multi-domain learning [46,47], image generation [23,9,43], image editing [42], style transfer [19], and super-resolution [56]. The operating principle is as simple as applying condition-dependent affine transformations on the normalized batch [21], local response [26], instance [52], layer [4], or feature group [58], allowing features to occupy different regions in the space, depending on the triggered condition.…”

Section: Conditional Normalization (Cn)mentioning

confidence: 99%

CompositeTasking: Understanding Images by Spatial Composition of Tasks

Popović

Paudel

Probst

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

View full text Add to dashboard Cite

We define the concept of CompositeTasking as the fusion of multiple, spatially distributed tasks, for various aspects of image understanding. Learning to perform spatially distributed tasks is motivated by the frequent availability of only sparse labels across tasks, and the desire for a compact multi-tasking network. To facilitate CompositeTasking, we introduce a novel task conditioning model -a single encoder-decoder network that performs multiple, spatially varying tasks at once. The proposed network takes an image and a set of pixel-wise dense task requests as inputs, and performs the requested prediction task for each pixel. Moreover, we also learn the composition of tasks that needs to be performed according to some CompositeTasking rules, which includes the decision of where to apply which task. It not only offers us a compact network for multitasking, but also allows for task-editing. Another strength of the proposed method is demonstrated by only having to supply sparse supervision per task. The obtained results are on par with our baselines that use dense supervision and a multi-headed multi-tasking design. The source code will be made publicly available at www.github.com/ nikola3794/composite-tasking.

show abstract

SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects

Cited by 68 publications

References 36 publications

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

CompositeTasking: Understanding Images by Spatial Composition of Tasks

Contact Info

Product

Resources

About