2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00143
|View full text |Cite
|
Sign up to set email alerts
|

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Abstract: In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the At-tnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
1,674
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,528 publications
(1,678 citation statements)
references
References 27 publications
3
1,674
0
1
Order By: Relevance
“…The generative network is used to synthesize images conditioned on the speech embedding feature. Following the recent works about text-to-images [3]- [5], we use a stacked conditional GAN, also known as StackGAN v2 [4], to synthesize the images due to its promising performance on generating photo-realistic images. As illustrated in Fig.…”
Section: Generative Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The generative network is used to synthesize images conditioned on the speech embedding feature. Following the recent works about text-to-images [3]- [5], we use a stacked conditional GAN, also known as StackGAN v2 [4], to synthesize the images due to its promising performance on generating photo-realistic images. As illustrated in Fig.…”
Section: Generative Networkmentioning
confidence: 99%
“…Note: the text is shown only for readability, it is not used in the speechto-image model. text-to-image translation, have been investigated in recent literature [3]- [5]. Besides, many languages have no writing form, which calls for the approaches to understand and visualize the speech directly [6].…”
Section: Introductionmentioning
confidence: 99%
“…such as conditional variational auto-encoder (CVAE) [10]- [12] and generative adversarial network (GAN) [13]- [18]. For instance, Yan et al [11] proposed a disentangled CVAEbased method for attribute-conditioned image generation.…”
Section: Introductionmentioning
confidence: 99%
“…Artificial Intelligence is promising in the field [11], [12] because it has been successfully applied in different fields like business for identifying job-hopping patterns [13], image recognition, etc. It can accurately and efficiently predict thousands of possible structures.…”
Section: Introductionmentioning
confidence: 99%