Recent deep learning based image editing methods have achieved promising results for removing object in an image but fail to generate plausible results for removing large objects of complex nature, especially in facial images. The objective of this work is to remove mask objects in facial images. This problem is challenging because (1) most of the time facial masks cover quite a large region of face that even extends beyond the actual face boundary below chin, and (2) facial image pairs with and without mask object do not exist for training. We break the problem into two stages: mask object detection and image completion of the removed mask region. The first stage of our model automatically produces binary segmentation for the mask region. Then, the second stage removes the mask and synthesizes the affected region with fine details while retaining the global coherency of face structure. For this, we have employed a GAN-based network using two discriminators where one discriminator helps learn the global structure of the face and then another discriminator comes in to focus learning on the deep missing region. To train our model in a supervised manner, we create a paired synthetic dataset using publicly available CelebA dataset and evaluated on real world images collected from the Internet. Our model outperforms others representative state-of-the-art approaches both qualitatively and quantitatively. INDEX TERMS Generative adversarial network, object removal, image editing.
Removing a specific object from an image and replacing the hole left behind with visually plausible backgrounds is a very intriguing task. While recent deep learning based object removal methods have shown promising results on this task for some structured scenes, none of them have addressed the problem of object removal in facial images. The objective of this work is to remove microphone object in facial images and fill hole with correct facial semantics and fine details. To make our solution practically useful, we present an interactive method called MRGAN, where the user roughly provides the microphone region. For filling the hole, we employ a Generative Adversarial Network based image-to-image translation approach. We break the problem into two stages: inpainter and refiner. The inpainter estimates coarse prediction by roughly filling in the microphone region followed by the refiner which produces fine details under the microphone region. We unite perceptual loss, reconstruction loss and adversarial loss as joint loss function for generating a realistic face and similar structure to the ground truth. Because facial image pairs with or without microphone do not exist, we have trained our method on a synthetically generated microphone dataset from CelebA face images and evaluated on real world microphone images. Our extensive evaluation shows that MRGAN performs better than state-of-the-art image manipulation methods on real microphone images although we only train our method using the synthetic dataset created. Additionally, we provide ablation studies for the integrated loss function and for different network arrangements.
This research features a user-friendly method for face de-occlusion in facial images where the user has control of which object to remove. Our system removes one object at a time, however, it is capable of removing multiple objects through repeated application. Although we show the effectiveness of our system on five commonly occurring occluding objects including hands, a medical mask, microphone, sunglasses, and eyeglasses, more types of object can be considered based on the proposed methodology. Our model learns to detect a user-selected, possibly distracting, object in the first stage. Then, the second stage removes the object using the object detection information from the first stage as guidance. To achieve this, we employ GAN-based networks in both stages. Specifically, in the second stage, we integrate both partial and vanilla convolution operations in the generator part of the GAN network. We show that by using this integration, the proposed network can learn a well-incorporated structure and also overcome the problem of visual discrepancies in the affected region of the face. To train our network, we produce a paired synthetic face-occluded dataset. Our model is evaluated using real world images collected from the Internet and publicly available CelebA and CelebA-HQ datasets. Experimental results confirm our model's effectiveness in removing challenging foreground non-face objects from facial images as compared to the existing representative state-of-the-art approaches.
In the real world, different artists draw sketches of the same person with different artistic styles both in texture and shape. Our goal is to synthesize realistic face sketches of different styles while retaining the input face identity, only using a single network. To achieve this, we employ a modified conditional GAN with a target style label as input. Our method is capable of synthesizing multiple sketch styles even though it is based on a single network. Sketches created by our method show sketch quality comparable to the state-of-the-art sketch synthesis methods that use multiple networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.