2023
DOI: 10.48550/arxiv.2301.12959
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Abstract: Synthesizing high-fidelity complex images from text is challenging. Based on large pretraining, the autoregressive and diffusion models can synthesize photo-realistic images. Although these large models have shown notable progress, there remain three flaws. 1) These models require tremendous training data and parameters to achieve good performance. 2) The multi-step generation design slows the image synthesis process heavily. 3) The synthesized visual features are difficult to control and require delicately de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Concurrent with our method, StyleGAN-T [85] and GALIP [92] share similar goals to ours. However, Giga-GAN and the aforementioned techniques were developed independently with distinct technical contributions.…”
Section: Related Workmentioning
confidence: 83%
“…Concurrent with our method, StyleGAN-T [85] and GALIP [92] share similar goals to ours. However, Giga-GAN and the aforementioned techniques were developed independently with distinct technical contributions.…”
Section: Related Workmentioning
confidence: 83%
“…The classification of the input-output types of AIGC is given in Table 5. Secure Steganography Based on Generative Adversarial Networks (SS-GAN) [96] Manipulate/edit images using textual descriptions Text-Adaptive Generative Adversarial Network (TAGAN) [67] Generate images based on textual instructions Denoising Diffusion Probabilistic Models (DDPM) [101] Guided Language to Image Diffusion for Generation (GLIDE) [147] Imagen [148] Attentional Generative Adversarial Networks (AttnGAN) [92] CogView [77] Auxiliary Classifier GANs (AC-GAN) [149] Stacked Generative Adversarial Networks (StackGAN) [93] alignDRAW (Deep Recurrent Attention Writer) [78] Deep Convolutional Generative Adversarial Networks (DCGAN) [86] Muse [150] Text Conditioned Auxiliary Classifier GAN (TAC-GAN) [67] Image Generate more complex images using captions Generative Adversarial CLIPs (GALIP) [151] Image Generate original, realistic images and art using a text prompt Contrastive Language Image Pre-training (CLIP) [130] Molecule Text-based de novo molecule generation, molecule captioning MolT5 (Molecular T5) [58] Molecule Structure Generate or retrieve molecular structures using textual description Text2Mol [60] Speech Synthesize custom voice speech using text Adaptive Text to Speech (AdaSpeech) [113] Convert text to human-like speech Denoising Diffusion Model for Text-to-Speech (Diff-TTS) [63] Grad-TTS [152] ProDiff [53] DiffGenerative Adversarial Networks (GAN)-TTS [128] Pixel Convolutional Neural Network -Wavenet [123] Generate speech using text Feed-Forward Transformer (FFT) [42] Generate high-quality, synthetic musical audio clips Generative Adversarial Networks Synth (GANSynth) [153] Text…”
Section: Aigc Input-output Classificationmentioning
confidence: 99%
“…This has been performed by embedding text prompts using a pre-trained CLIP ViT-L/14 text encoder. Tao et al [54] proposed generative adversarial CLIPs, known as GALIP, which uses a pre-trained CLIP model in both the discriminator and generator.…”
Section: Recent Studiesmentioning
confidence: 99%