2021
DOI: 10.1016/j.imavis.2021.104284
|View full text |Cite
|
Sign up to set email alerts
|

Transformer models for enhancing AttnGAN based text to image generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 12 publications
0
12
0
1
Order By: Relevance
“…Attentional Generative Adversarial Network (AttnGAN) enables multi-stage, attention-driven image generation from textual description [7], [8]. AttnGAN begins with a rudimentary low-resolution image which it then refines in multiple phases to produce a final image from the natural language description.…”
Section: A Gansmentioning
confidence: 99%
See 3 more Smart Citations
“…Attentional Generative Adversarial Network (AttnGAN) enables multi-stage, attention-driven image generation from textual description [7], [8]. AttnGAN begins with a rudimentary low-resolution image which it then refines in multiple phases to produce a final image from the natural language description.…”
Section: A Gansmentioning
confidence: 99%
“…The author of [8], presented AttnGANTRANS which consists of Attentional GAN and transformer models such as Bidirectional Encoder Representations from Transformers (BERT), GPT2, and XLNet that were capable of extracting semantic information from text descriptions more accurately than the conventional AttnGAN. Gao et al [13] proposed LD-CGAN comprised of one generator and two independent discriminators to regularize and generate 64x64 and 128x128 images.…”
Section: B Text To Image Synthesismentioning
confidence: 99%
See 2 more Smart Citations
“…Through further studies, researchers have made several improvements to the Transformer, resulting in networks such as DETR [ 24 ], ViT [ 25 ], and SETR-MLP [ 26 ]. These networks have been applied to various fields, such as object detection [ 27 , 28 , 29 ], semantic segmentation [ 30 , 31 , 32 ], image classification [ 33 , 34 , 35 ], and image generation [ 36 ]. In this study, the advanced Swin-Transformer [ 37 ] network is used to conduct research in the following three aspects: (1) Proposing a high-precision classification and detection model for mutton multi-parts; (2) testing the robustness, generalization, and anti-occlusion performance of the proposed model; (3) introducing other mainstream detection algorithms to evaluate the advantages and disadvantages of the proposed model and test its real-time performance.…”
Section: Introductionmentioning
confidence: 99%