Conditional Image-to-Image Translation

Lin, Jianxin; Xia, Yingce; Qin, Tao; Chen, Zhibo; Liu, Tie-Yan

doi:10.1109/cvpr.2018.00579

Cited by 145 publications

(106 citation statements)

References 21 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…To generate controllable translation result, Lin et al [7] decompose the image latent space into domain-independent and domain-specific feature spaces, and raise a new problem named as conditional cross-domain translation which can assign domain-specific feature for generated result by feeding a conditional image in the target domain. Similar to [7], other two works [8], [9] proposed to disentangle latent space and generate diverse translation results. Choi et al [10] further proposed a StarGAN that can perform image-to-image translations for multiple domains using only a single model.…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, we can leverage such a pre-trained CNN to extract the domain-specific features of an image. Compared with works [7], [8], [9] that use two separated domain-specific feature extractors for two domain translation, we utilize the domain classifier as a general domainspecific feature extractor which can be easily generalized to multi-domain translation. With the well-defined domainspecific features, the domain-independent features can be easily obtained by feature disentanglement.…”

Section: Introductionmentioning

confidence: 99%

“…Existing multi-domain translation models, such as Star-GAN, also lack the ability to control the translated results in the target domain and their results usually lack of diversity in the sense that a fixed image usually leads to (almost) deterministic translation result. With two kinds of latent feature (i.e., domain-specific and domain-independent features) disentangled, we can devise a conditional DosGAN (briefly, DosGAN-c) for conditional image-to-image translation [7], which is to translate an image from the source domain to the target domain conditioned on a given image in the target domain and requires that the generated image should inherit some domain-specific features of the conditional image from the target domain. The DosGAN-c shares the same network architecture with DosGAN, and the difference between two frameworks only lies in the inputs and losses.…”

Section: Introductionmentioning

confidence: 99%

“…Compared with the previous work cd-GAN [7], there are three main differences: (1) Unlike cd-GAN or CycleGAN that simply treats multiple domains as different sources of images, in this work, we regard domain information as explicit supervision, where we train a classifier to classify the domain of the input image, and convert it to a domainspecific feature extractor. (2) The domain-specific features in our model are explicitly modeled, which are extracted by a well-defined and fixed domain classification network.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation

Lin

Chen

Xia

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

View full text Add to dashboard Cite

Image-to-image translation tasks have been widely investigated with Generative Adversarial Networks (GANs). However, existing approaches are mostly designed in an unsupervised manner while little attention has been paid to domain information within unpaired data. In this paper, we treat domain information as explicit supervision and design an unpaired image-to-image translation framework, Domain-supervised GAN (DosGAN), which takes the first step towards the exploration of explicit domain supervision. In contrast to representing domain characteristics using different generators or domain codes, we pre-train a classification network to explicitly classify the domain of an image. After pre-training, this network is used to extract the domain-specific features of each image. Such features, together with the domain-independent features extracted by another encoder (shared across different domains), are used to generate image in target domain. Extensive experiments on multiple facial attribute translation, multiple identity translation, multiple season translation and conditional edges-to-shoes/handbags demonstrate the effectiveness of our method. In addition, we can transfer the domain-specific feature extractor obtained on the Facescrub dataset with domain supervision information to unseen domains, such as faces in the CelebA dataset. We also succeed in achieving conditional translation with any two images in CelebA, while previous models like StarGAN cannot handle this task.Index Terms-Image-to-image translation, explicit domain supervision, generative adversarial networks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation

Lin

Chen

Xia

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Unpaired translations Various works extended this idea to the case where no explicit input-output image pairs are available (unpaired image translation), using the idea of cyclic consistency [31,72,79,41] or consistency between certain extracted features [63]. To avoid accidental artifacts and improve learning, Mejjati et al [48] integrate an attention mechanism to help translations focus on semantically meaningful regions.…”

Section: Paired Translationsmentioning

confidence: 99%

Mix and Match Networks: Encoder-Decoder Alignment for Zero-Pair Image Translation

Wang

Weijer

Herranz

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

This paper addresses the problem of inferring unseen cross-domain and cross-modal image-toimage translations between multiple domains and modalities. We assume that only some of the pairwise translations have been seen (i.e. trained) and infer the remaining unseen translations (where training pairs are not available). We propose mix and match networks, an approach where multiple encoders and decoders are aligned in such a way that the desired translation can be obtained by simply cascading the source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). The main challenge lies in the alignment of the latent representations at the bottlenecks of encoder-decoder pairs. We propose an architecture with several tools to encourage alignment, including autoencoders and robust side information and latent consistency losses. We show the benefits of our approach in terms of effectiveness and scalability compared with other pairwise image-to-image translation approaches. We also propose zero-pair cross-modal image translation, a challenging setting where the objective is inferring semantic segmentation from depth (and vice-versa) without explicit segmentation-depth pairs, and only from two (disjoint) segmentation-RGB and depth-segmentation training sets. We observe that certain part of the shared information between unseen domains might not be reachable, so we further propose a variant that leverages pseudo-pairs to exploit all shared information.

show abstract