2020
DOI: 10.1109/access.2020.3017881
|View full text |Cite
|
Sign up to set email alerts
|

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

Abstract: Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(17 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…To maintain frame-level and video level coherence two Discriminators were used. Kim et al [82] came up with Text-to-Image-to-Video Generative Adversarial Network (TiVGAN) for text-based video generation. The key idea is to begin by learning text-to-single-image generation, then gradually increase the number of images produced, and repeat until the desired video length is reached.…”
Section: Synthesizing Videos Usingmentioning
confidence: 99%
See 1 more Smart Citation
“…To maintain frame-level and video level coherence two Discriminators were used. Kim et al [82] came up with Text-to-Image-to-Video Generative Adversarial Network (TiVGAN) for text-based video generation. The key idea is to begin by learning text-to-single-image generation, then gradually increase the number of images produced, and repeat until the desired video length is reached.…”
Section: Synthesizing Videos Usingmentioning
confidence: 99%
“…To have stable training and prevent problems of mode collapse the authors used Wasserstein GAN with gradient penalty (WGAN-GP) [52] loss and the technique of progressively growing GAN or ProGAN [78] which has been shown to generate high resolution images. VPGAN [66] Video generation and prediction GAN models Unconditional video generation VGAN [165], TGAN [140], FTGAN [125], MoCoGAN [160], DVD-GAN [24] Conditional video generation VAE-GAN [103], TGANs-C [140], TFGAN [9], Sto-ryGAN [102], BoGAN [18], TiVGAN [82], LSTM and cGAN [71], RNN based GAN [167], TemporalGAN [195]…”
Section: Video Predictionmentioning
confidence: 99%
“…Generic video synthesis methods generate videos by sampling from a random distribution [56,57]. To get more control over the generated content, conditional video synthesis works utilize input signals, such as images [7,17], text or language [5,28], and action classes [62]. This enables synthesized videos containing the desired objects as specified by visual information or desired actions as specified by textual information.…”
Section: Introductionmentioning
confidence: 99%
“…Existing works on conditional video generation use only one of the possible control signals as inputs [6,28]. This limits the flexibility and quality of the generative process.…”
Section: Introductionmentioning
confidence: 99%
“…Recently there has been interest in the generative modeling community in reconstructing videos from lower bitrate alternatives such as text or low dimensional latent spaces [23,26,42]. While there has been significant progress on using generative machine learning to model natural images from text [9,27,31], these approaches are currently unable to produce high-quality videos.…”
Section: Introductionmentioning
confidence: 99%