For a given text, previous text-to-image synthesis methods commonly utilize a multistage generation model to produce images with high resolution in a coarse-to-fine manner. However, these methods ignore the interaction among stages, and they do not constrain the consistent cross-sample relations of images generated in different stages. These deficiencies result in inefficient generation and discrimination. In this study, we propose an interstage cross-sample similarity distillation model based on a generative adversarial network (GAN) for learning efficient text-to-image synthesis. To strengthen the interaction among stages, we achieve interstage knowledge distillation from the refined stage to the coarse stages with novel interstage cross-sample similarity distillation blocks. To enhance the constraint on the cross-sample relations of the images generated at different stages, we conduct cross-sample similarity distillation among the stages. Extensive experiments on the Oxford-102 and Caltech-UCSD Birds-200-2011 (CUB) datasets show that our model generates visually pleasing images and achieves quantitatively comparable performance with state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.