2021 International Joint Conference on Neural Networks (IJCNN) 2021
DOI: 10.1109/ijcnn52387.2021.9533527
|View full text |Cite
|
Sign up to set email alerts
|

DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 38 publications
(31 citation statements)
references
References 20 publications
0
31
0
Order By: Relevance
“…Tao et al [34] proposed DFGAN where a matching-aware zero-centered gradient penalty loss is introduced to help stabilize the training of the conditional text-to-image GAN model. Zhang et al [35] designed DTGAN by utilizing spatial and channel attention modules and the conditional normalization to yield photo-realistic samples with a generator/discriminator pair. Zhang et al [36] developed XMC-GAN which studied contrastive learning in the context of text-to-image generation while producing visually plausible images via a simple single-stage framework.…”
Section: Cgan In Text-to-image Generationmentioning
confidence: 99%
See 1 more Smart Citation
“…Tao et al [34] proposed DFGAN where a matching-aware zero-centered gradient penalty loss is introduced to help stabilize the training of the conditional text-to-image GAN model. Zhang et al [35] designed DTGAN by utilizing spatial and channel attention modules and the conditional normalization to yield photo-realistic samples with a generator/discriminator pair. Zhang et al [36] developed XMC-GAN which studied contrastive learning in the context of text-to-image generation while producing visually plausible images via a simple single-stage framework.…”
Section: Cgan In Text-to-image Generationmentioning
confidence: 99%
“…After that, the generator ( , ( , )) is trained to produce a perceptually-realistic and semanticallyrelated image ̂ according to a latent code randomly sampled from a frozen distribution and word/sentence embedding vectors ( , ). To be specific, ( , ( , )) consists of multiple layers where the first layer 0 maps a latent code into a feature map and intermediate blocks typically leverage modulation modules (e.g., attention models [35,2]) to reinforce the visual feature map to ensure image quality and semantic consistency. The last layer transforms the feature map into the ultimate sample.…”
Section: Preliminarymentioning
confidence: 99%
“…Tao et al [4] designed a matchingaware zero-centered gradient penalty (MA-GP) loss addressing the issues of the multi-stage framework. Zhang et al [1] presented DTGAN leveraging two new attention models and conditional adaptive instance-layer normalization to produce perceptually realistic samples with a generator/discriminator pair. Zhang et al [5] proposed DiverGAN which can yield diverse and visually plausible samples which are semantically correlated with given natural-language descriptions.…”
Section: Related Work a Text-to-image Generationmentioning
confidence: 99%
“…) denotes a suite of K natural-language descriptions, while S is cast into a training set and a testing set. Current conditional text-to-image GANs [1]- [3] approaches commonly follow the same paradigm. The generator G aims at yielding a visually plausible and semantically consistent sample Îi according to a latent code randomly sampled from a fixed distribution and a text description c i randomly picked from C i , where c i = (w 1 , w 2 , ..., w m ) contains m words.…”
Section: A Preliminarymentioning
confidence: 99%
See 1 more Smart Citation