Objectives: To provide insight into deep generative models and review the most prominent and efficient deep generative models, including Variational Auto-encoder (VAE) and Generative Adversarial Networks (GANs). Methods: We provide a comprehensive overview of VAEs and GANs along with their advantages and disadvantages. This paper also surveys the recently introduced Attention-based GANs and the most recently introduced Transformer based GANs. Findings: GANs have been intensively researched because of their significant advantages over VAE. Furthermore, GANs are powerful generative models that have been widely employed in a variety of fields. Though GANs have a number of advantages over VAEs, but, despite their immense popularity and success, training GANs is still difficult and has experienced a lot of setbacks. These failures include mode collapse, where the generator produces the same set of outputs for various inputs, ultimately resulting in the loss of diversity; non-convergence due to oscillatory and diverging behaviors of the generator and discriminator during the training phase; and vanishing or exploding gradients, where learning either ceases to occur or occurs very slowly. Recently, some attention-based GANs and Transformer-based GANs have also been proposed for high-fidelity image generation. Novelty: Unlike previous survey articles, which often focus on all DGMs and dive into their complicated aspects, this work focuses on the most prominent DGMs, VAEs, and GANs and provides a theoretical understanding of them. Furthermore, because GAN is now the most extensively used DGM being studied by the academic community, the literature on it needs to be explored more. Moreover, while numerous articles on GANs are available, none have analyzed the most recent attention-based GANs and Transformer-based GANs. So, in this study, we review the recently introduced attention-based GANs and Transformer-based GANs, the literature related to which has not been reviewed by any survey paper.
Generating videos is a novel area of computer vision research that is still far from being addressed. The reason for the same being that videos are very complex in nature where both spatial and temporal coherence needs to be taken care of. Compared to the unconditional video generation, an automated video generation from the text description is an even more difficult task, in which maintaining semantic consistency and visual quality are very crucial. The video generation from the text description seems to be non-trivial owing to the intrinsic complexity that occurs in the frames and video framework. The conditional generative models are required to be implemented for this challenging task of text-to-video generation. “Generative adversarial networks (GANs)” have had a lot of success in producing images conditioned over the natural language description. But, it is yet to be employed for producing realistic videos from text that are temporally and spatially coherent and semantically consistent with the text descriptions. Thus, a new Optimised Dual Discriminator Video Generative Adversarial Network (ODD-VGAN) for text-to-video generation is suggested in this paper. The hyper-parameters of ODD-VGAN are optimised using the improved reptile search algorithm (IRSA). The efficiency of the proposed approach is demonstrated by both qualitative and quantitative experimental results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.