Guided Variational Autoencoder for Disentanglement Learning

Zheng, Da; Xu, Yifan; Xu, Weiran; Parmar, Gaurav; Yang, Yang; Welling, Max; Tu, Zhuowen

doi:10.1109/cvpr42600.2020.00794

Cited by 86 publications

(75 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This shows that the proposed model is able to provide a rich data representation. We also compare with the recently proposed GUIDE model [62]. (18), and the Gumbel-Softmax dropout (MVAE-GS) using equation ( 24) as the objective function, are provided in Table VI.…”

Section: F Evaluation Of the Representation Learningmentioning

confidence: 99%

Deep Mixture Generative Autoencoders

Borş

2022

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Variational autoencoders (VAEs) are one of the most popular unsupervised generative models which rely on learning latent representations of data. In this paper, we extend the classical concept of Gaussian mixtures into the deep variational framework by proposing a mixture of VAEs (MVAE). Each component in the MVAE model is implemented by a variational encoder and has an associated sub-decoder. The separation between the latent spaces modelled by different encoders is enforced using the d-variable Hilbert-Schmidt Independence Criterion (dHSIC) criterion. Each component would capture different data variational features. We also propose a mechanism for finding the appropriate number of VAE components for a given task, leading to an optimal architecture. The differentiable categorical Gumbel-Softmax distribution is used in order to generate dropout masking parameters within the end-toend backpropagation training framework. Extensive experiments show that the proposed MAVE model learns a rich latent data representation and is able to discover additional underlying data factors. Index Terms-Mixtures of Variational Autoencoders, Generative deep learning, Representation learning, Optimal number of components in mixtures.

show abstract

Section: F Evaluation Of the Representation Learningmentioning

confidence: 99%

Deep Mixture Generative Autoencoders

Borş

2022

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…Training of a VAE can be understood as maximization of the dataset log-likelihood with the addition of a Kullback-Leibler regularization term D KL [q φ (Z|S), p θ (Z|S)], where p θ (Z|S) is the posterior of the decoder. 28,29 Our VAE architecture is intentionally "vanilla", 33 and our encoder and decoder use a simple architecture unlike more sophisticated VAE implementations which use convolutional layers, 19 multi-stage training, 63 disentanglement learning, 33 Riemannian Brownian motion priors, 64 and more. It reflects a simple VAE architecture that has been implemented as a VAE-GPSM in prior work.…”

Section: Vvaementioning

confidence: 99%

“…These GPSMs include a pairwise Potts Hamiltonian model with pairwise interaction terms (Mi3), 32 a vanilla variational autoencoder (vVAE), and a site-independent model which does not model covariation (Indep). We choose the simplest, original 28,29 "vanilla" 33 architecture qualitatively similar to that used in previous VAE-GPSM studies, 21 as opposed to more complex VAE-GPSM architectures used by others (see Methods). 19,22 We evaluate the generative capacity of a model using four MSA statistics: pairwise covariance correlations, 2,13,34,35 higher-order marginals (r 20 ), 13 Hamming distance distributions, 2,13,36,37 and statistical energy correlations.…”

Section: Introductionmentioning

confidence: 99%

Generative Capacity of Probabilistic Protein Sequence Models

McGee

Novinger

Levy

et al. 2021

Preprint

View full text Add to dashboard Cite

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict the effect of mutations. Despite encouraging results, quantitative characterization and comparison of GPSM-generated probability distributions is still lacking. It is currently unclear whether GPSMs can faithfully reproduce the complex multi-residue mutation patterns observed in natural sequences arising due to epistasis. We develop a set of sequence statistics to comparatively assess the accuracy, or “generative capacity”, of three GPSMs: a pairwise Potts Hamiltonian, a vanilla VAE, and a site-independent model, using natural and synthetic datasets. We show that the generative capacity of the Potts Hamiltonian model is the largest; the higher order mutational statistics generated by the model agree with those observed for natural sequences. In contrast, we show that the vanilla VAE’s generative capacity lies between the pairwise Potts and site-independent models. Importantly, our work measures GPSM generative capacity in terms of higher-order sequence covariation and provides a new framework for evaluating and interpreting GPSM accuracy that emphasizes the role of epistasis.

show abstract

“…Although remarkable results were obtained by these unsupervised DRL methods using toy datasets such as dSprites [22] and 3D Shapes [23], there is no guarantee that each latent variable corresponds to a single semantically meaningful factor of variation without any inductive bias [10], [24], [25]. Hence, recent DRL studies have focused on introduction to a model of an explicit prior that imposes constraints or regulariza-tions based on the underlying structure of complicated realworld images [26], [27], such as translation and rotation [2], [28], hierarchical features [8], [9], [29] and domain-specific knowledge [10].…”

Section: Introductionmentioning

confidence: 99%

“…By assuming that an image consists of local subspaces (sets of pixels) corresponding to objects, DRL methods partition the image to disentangle the latent representation, with the latent variables being separated into distinct sets corresponding to the subspaces. However, conventional DRL methods based on image segmentation change the standard VAE backbone to perform image segmentation and representation learning simultaneously [28]. Since the standard VAE has a statistical backbone supported by the variational Bayesian method, architectural changes of the VAE model may cause unstable learning and deteriorate the ability of disentanglement [5], [28].…”

Section: Introductionmentioning

confidence: 99%

Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior

et al. 2021

View full text Add to dashboard Cite

We propose a novel method that can learn easy-to-interpret latent representations in realworld image datasets using a VAE-based model by splitting an image into several disjoint regions. Our method performs object-wise disentanglement by exploiting image segmentation and alpha compositing. With remarkable results obtained by unsupervised disentanglement methods for toy datasets, recent studies have tackled challenging disentanglement for real-world image datasets. However, these methods involve deviations from the standard VAE architecture, which has favorable disentanglement properties. Thus, for disentanglement in images of real-world image datasets with preservation of the VAE backbone, we designed an encoder and a decoder that embed an image into disjoint sets of latent variables corresponding to objects. The encoder includes a pre-trained image segmentation network, which allows our model to focus only on representation learning while adopting image segmentation as an inductive bias. Evaluations using real-world image datasets, CelebA and Stanford Cars, showed that our method achieves improved disentanglement and transferability.INDEX TERMS Alpha blend, disentanglement, image segmentation, real-world image, representation learning.

show abstract

Guided Variational Autoencoder for Disentanglement Learning

Cited by 86 publications

References 14 publications

Deep Mixture Generative Autoencoders

Deep Mixture Generative Autoencoders

Generative Capacity of Probabilistic Protein Sequence Models

Disentangled Representation Learning in Real-World Image Datasets via Image Segmentation Prior

Contact Info

Product

Resources

About