In this paper, we focus on generating photo-realistic images from given text descriptions. Current methods first generate an initial image and then progressively refine it to a high-resolution one. These methods typically indiscriminately refine all granularity features output from the previous stage. However, the ability to express different granularity features in each stage is not consistent, and it is difficult to express precise semantics by further refining the features with poor quality generated in the previous stage. Current methods cannot refine different granularity features independently, resulting in that it is challenging to clearly express all factors of semantics in generated image, and some features even become worse. To address this issue, we propose a Hierarchical Disentangled Representations Generative Adversarial Networks (HDR-GAN) to generate photo-realistic images by explicitly disentangling and individually modeling the factors of semantics in the image. HDR-GAN introduces a novel component called multi-granularity feature disentangled encoder to represent image information comprehensively through explicitly disentangling multi-granularity features including pose, shape and texture. Moreover, we develop a novel Multi-granularity Feature Refinement (MFR) containing a Coarsegrained Feature Refinement (CFR) model and a Fine-grained Feature Refinement (FFR) model. CFR utilizes coarse-grained disentangled representations (e.g., pose and shape) to clarify category information, while FFR employs fine-grained disentangled representations (e.g., texture) to reflect instance-level details. Extensive experiments on two well-studied and publicly available datasets (i.e., CUB-200 and CLEVR-SV) demonstrate the rationality and superiority of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.