2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00794
|View full text |Cite
|
Sign up to set email alerts
|

Guided Variational Autoencoder for Disentanglement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
75
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 86 publications
(75 citation statements)
references
References 14 publications
0
75
0
Order By: Relevance
“…This shows that the proposed model is able to provide a rich data representation. We also compare with the recently proposed GUIDE model [62]. (18), and the Gumbel-Softmax dropout (MVAE-GS) using equation ( 24) as the objective function, are provided in Table VI.…”
Section: F Evaluation Of the Representation Learningmentioning
confidence: 99%
“…This shows that the proposed model is able to provide a rich data representation. We also compare with the recently proposed GUIDE model [62]. (18), and the Gumbel-Softmax dropout (MVAE-GS) using equation ( 24) as the objective function, are provided in Table VI.…”
Section: F Evaluation Of the Representation Learningmentioning
confidence: 99%
“…Training of a VAE can be understood as maximization of the dataset log-likelihood with the addition of a Kullback-Leibler regularization term D KL [q φ (Z|S), p θ (Z|S)], where p θ (Z|S) is the posterior of the decoder. 28,29 Our VAE architecture is intentionally "vanilla", 33 and our encoder and decoder use a simple architecture unlike more sophisticated VAE implementations which use convolutional layers, 19 multi-stage training, 63 disentanglement learning, 33 Riemannian Brownian motion priors, 64 and more. It reflects a simple VAE architecture that has been implemented as a VAE-GPSM in prior work.…”
Section: Vvaementioning
confidence: 99%
“…These GPSMs include a pairwise Potts Hamiltonian model with pairwise interaction terms (Mi3), 32 a vanilla variational autoencoder (vVAE), and a site-independent model which does not model covariation (Indep). We choose the simplest, original 28,29 "vanilla" 33 architecture qualitatively similar to that used in previous VAE-GPSM studies, 21 as opposed to more complex VAE-GPSM architectures used by others (see Methods). 19,22 We evaluate the generative capacity of a model using four MSA statistics: pairwise covariance correlations, 2,13,34,35 higher-order marginals (r 20 ), 13 Hamming distance distributions, 2,13,36,37 and statistical energy correlations.…”
Section: Introductionmentioning
confidence: 99%
“…Although remarkable results were obtained by these unsupervised DRL methods using toy datasets such as dSprites [22] and 3D Shapes [23], there is no guarantee that each latent variable corresponds to a single semantically meaningful factor of variation without any inductive bias [10], [24], [25]. Hence, recent DRL studies have focused on introduction to a model of an explicit prior that imposes constraints or regulariza-tions based on the underlying structure of complicated realworld images [26], [27], such as translation and rotation [2], [28], hierarchical features [8], [9], [29] and domain-specific knowledge [10].…”
Section: Introductionmentioning
confidence: 99%
“…By assuming that an image consists of local subspaces (sets of pixels) corresponding to objects, DRL methods partition the image to disentangle the latent representation, with the latent variables being separated into distinct sets corresponding to the subspaces. However, conventional DRL methods based on image segmentation change the standard VAE backbone to perform image segmentation and representation learning simultaneously [28]. Since the standard VAE has a statistical backbone supported by the variational Bayesian method, architectural changes of the VAE model may cause unstable learning and deteriorate the ability of disentanglement [5], [28].…”
Section: Introductionmentioning
confidence: 99%