2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00766
|View full text |Cite
|
Sign up to set email alerts
|

Dual Adversarial Inference for Text-to-Image Synthesis

Abstract: Synthesizing images from a given text description involves engaging two types of information: the content, which includes information explicitly described in the text (e.g., color, composition, etc.), and the style, which is usually not well described in the text (e.g., location, quantity, size, etc.). However, in previous works, it is typically treated as a process of generating images only from the content, i.e., without considering learning meaningful style representations. In this paper, we aim to learn tw… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 33 publications
(27 citation statements)
references
References 14 publications
0
27
0
Order By: Relevance
“…GAN-based Text-to-image generation. In the past few years, Generative Adversarial Networks (GANs) [18] have shown promising results on text-to-image generation [5,8,9,14,17,22,[27][28][29][30]34,39,40,46,47,54,[56][57][58][62][63][64][65][66][67][68]. GAN-INT-CLS [46] was the first to use a conditional GAN formulation for text-to-image generation.…”
Section: Related Workmentioning
confidence: 99%
“…GAN-based Text-to-image generation. In the past few years, Generative Adversarial Networks (GANs) [18] have shown promising results on text-to-image generation [5,8,9,14,17,22,[27][28][29][30]34,39,40,46,47,54,[56][57][58][62][63][64][65][66][67][68]. GAN-INT-CLS [46] was the first to use a conditional GAN formulation for text-to-image generation.…”
Section: Related Workmentioning
confidence: 99%
“…We group T2I models that take the generated image and pass it through an image captioning [67,68,69] or image encoder network [70], thereby creating a cycle to the input description or latent code, as cycle consistency approaches.…”
Section: Cycle Consistencymentioning
confidence: 99%
“…T2IG performance using the stacked structure [7] as the basic structure is further improved [8], [31] by adding a word-level attention mechanism [9]. Recently, more complex and effective structures for T2IG were proposed [8], [29], [32], [33]. For instance, Qiao et al [8] proposed a model structure similar to the encoder-decoder, called MirrorGAN that tried to recover the text descriptions based on the synthesized images to guarantee semantic consistency between the text description and generated image.…”
Section: A Natural Language-to-image Generationmentioning
confidence: 99%