Dingdong Yang scite author profile

We propose a novel hierarchical approach for text-toimage synthesis by inferring semantic layout. Instead of learning a direct mapping from text to image, our algorithm decomposes the generation process into multiple steps, in which it first constructs a semantic layout from the text by the layout generator and converts the layout to an image by the image generator. The proposed layout generator progressively constructs a semantic layout in a coarse-to-fine manner by generating object bounding boxes and refining each box by estimating object shapes inside the box. The image generator synthesizes an image conditioned on the inferred semantic layout, which provides a useful semantic structure of an image matching with the text description. Our model not only generates semantically more meaningful images, but also allows automatic annotation of generated images and user-controlled generation process by modifying the generated scene layout. We demonstrate the capability of the proposed model on challenging MS-COCO dataset and show that the model can substantially improve the image quality, interpretability of output and semantic alignment to input text over existing approaches.

show abstract

Diversity-Sensitive Conditional Generative Adversarial Networks

Yang¹,

Hong²,

Jang³

et al. 2019

Preprint

View full text Add to dashboard Cite

We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

show abstract

AvatarGen: A 3D Generative Model for Animatable Human Avatars

Zhang¹,

Jiang²,

Yang³

et al. 2023

View full text Add to dashboard Cite

show abstract

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Hong

Yang

Choi

et al. 2018

Preprint

View full text Add to dashboard Cite

Automatic correction of lithography hotspots with a deep generative model

Sim

Lee

Yang

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dingdong Yang

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Diversity-Sensitive Conditional Generative Adversarial Networks

AvatarGen: A 3D Generative Model for Animatable Human Avatars

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis

Automatic correction of lithography hotspots with a deep generative model

Contact Info

Product

Resources

About