A Comparative Study on Variational Autoencoders and Generative Adversarial Networks

Sami, Mirza Tanzim; Mobin, Iftekharul

doi:10.1109/icaiit.2019.8834544

Cited by 17 publications

(8 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To model the cognitive processes required for the reconstruction and cross-modal generation tasks, we employed a mixture-of-experts multimodal variational autoencoder (MMVAE; Shi et al, 2019 ). The MMVAE is one of the generative models for multimodal learning that exhibits high performance in terms of generation quality.…”

Section: Methodsmentioning

confidence: 99%

“…Both models learned the latent representations of

, but in different ways. For training, MMVAE maximizes the following objective function ( Shi et al, 2019 ):…”

Section: Methodsmentioning

confidence: 99%

“…Additionally, the concept of multimodal learning has been applied in the fields of machine learning and deep neural networks ( Baltrušaitis et al, 2019 ; Suzuki and Matsuo, 2022 ). Learning by using multiple modalities enhances the performance of neural network models ( Shi et al, 2019 ).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Emergence of number sense through the integration of multimodal information: developmental learning insights from neural network models

Noda,

Soda,

Yamashita

2024

Front. Neurosci.

View full text Add to dashboard Cite

IntroductionAssociating multimodal information is essential for human cognitive abilities including mathematical skills. Multimodal learning has also attracted attention in the field of machine learning, and it has been suggested that the acquisition of better latent representation plays an important role in enhancing task performance. This study aimed to explore the impact of multimodal learning on representation, and to understand the relationship between multimodal representation and the development of mathematical skills.MethodsWe employed a multimodal deep neural network as the computational model for multimodal associations in the brain. We compared the representations of numerical information, that is, handwritten digits and images containing a variable number of geometric figures learned through single- and multimodal methods. Next, we evaluated whether these representations were beneficial for downstream arithmetic tasks.ResultsMultimodal training produced better latent representation in terms of clustering quality, which is consistent with previous findings on multimodal learning in deep neural networks. Moreover, the representations learned using multimodal information exhibited superior performance in arithmetic tasks.DiscussionOur novel findings experimentally demonstrate that changes in acquired latent representations through multimodal association learning are directly related to cognitive functions, including mathematical skills. This supports the possibility that multimodal learning using deep neural network models may offer novel insights into higher cognitive functions.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Both models learned the latent representations of

, but in different ways. For training, MMVAE maximizes the following objective function ( Shi et al, 2019 ):…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Emergence of number sense through the integration of multimodal information: developmental learning insights from neural network models

Noda,

Soda,

Yamashita

2024

Front. Neurosci.

View full text Add to dashboard Cite

show abstract

“…Instead of only learning the compressed image, VAE learns the distribution of the data, and by exploiting the distribution, we can decode and produce new data. VAEs (Variational Auto-encoders) have also been highly successful, to the point where they are frequently mathematically more accurate at producing images that closely resemble their original dataset [9].…”

Section: Auto Encoders Based Image Synthesismentioning

confidence: 99%

“…An architecture made up of both an encoder and a decoder that is trained to minimize the reconstruction error between the encoded-decoded data and the starting data is known as a Variational Auto encoder (VAE) [9]. Instead of encoding an input as a single point, we encode it as a distribution over the latent space in order to introduce some regularization of the latent space.…”

Section: Vaementioning

confidence: 99%

Clothing Fashion Image Generation From Text Using Artificial Intelligence

Shaheen,

Iqbal

2023

IJEAST

View full text Add to dashboard Cite

Development of dynamic, intensely engaging, and fascinating images has greatly benefited from the recent exponential advancements in image synthesis techniques. The architecture proposed in this research allows users to enter text regarding a particular dress, and the model then create images of fashionable apparel based on that content. The model suggested can let people become their own fashion designers by utilizing the strength of Deep Learning and Artificial intelligence to create a variety of fashionable outfits for themselves. DALL-E model is utilized to engender realistic images based on text description. DALL-E is an artificial intelligence model that generates realistic images from a description in natural language. While there are alternative text-to-image systems, the DALL-E produces far more coherent visuals. The world and the relationships between objects appear to be well understood by this technology. DALL-E uses GPT-3 model and dataset of textimage pairs for image synthesis. Image is encoded into size of 32×32 grid using VQ-VAE. Then image and text are combined together in the form of single stream for training of DALL-E. Deep Fashion dataset is used for training of DALL-E, which is simply more realistic dataset and contains High definition images that further enable accurate generation. After training DALL-E produce more accurate results and provides higher inception score than preceding models.

show abstract

MAGAN: Mode Information and Attention-Based GAN for Realistic Time Series Data Synthesis

Wang,

Luo,

Ren

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A Comparative Study on Variational Autoencoders and Generative Adversarial Networks

Cited by 17 publications

References 12 publications

Emergence of number sense through the integration of multimodal information: developmental learning insights from neural network models

Emergence of number sense through the integration of multimodal information: developmental learning insights from neural network models

Clothing Fashion Image Generation From Text Using Artificial Intelligence

MAGAN: Mode Information and Attention-Based GAN for Realistic Time Series Data Synthesis

Contact Info

Product

Resources

About