“…In particular, those image synthesis tasks involving persons or portrait [6,27,28] can be applied in a wide variety of scenarios, such as advertising, games, and motion capture, etc. Most real-world image synthesis tasks only involve the local generation, which means generating pixels in certain regions, while maintaining the semantic consistency, e.g., face editing [18,1,38], pose guiding [34,53,45], and image inpainting [49,29,47,51]. Unfortunately, most works can only handle the images of 'icon-view' foreground, rather than the image synthesis of 'non-iconic view' foreground [23], concerned in this paper.…”