“…As image inpainting requires a high-level semantic context, and to explicitly include it in the generation pipeline, there exist hand-crafted architectural designs such as Dilated Convolutions [13,38] to increase the receptive field, Partial Convolutions [16] and Gated Convolutions [41] to guide the convolution kernel according to the inpainted mask, Contextual Attention [39] to leverage on global information, Edges maps [7,22,36,37] or Semantic Segmentation maps [11,25] to further guide the generation, and Fourier Convolutions [32] to include both global and local information efficiently. Although recent works produce photo-realistic results, GANs are well known for textural synthesis, so these methods shine on background completion or removing objects, which require repetitive structural synthesis, and struggle with semantic synthesis (See Figure 5).…”