Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model

Han, Tian; Nijkamp, Erik; Fang, Xiaolin; Hill, Mitch; Zhu, Song‐Chun; Wu, Ying

doi:10.1109/cvpr.2019.00887

Cited by 45 publications

(44 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because of the negative sign in front of the second KL divergence in Equation 22, we need max θ in Equation 22 or min θ in Equation 23, so that the learning becomes adversarial (illustrated in Figure 12). Inspired by (Hinton, 2002), Han et al (2019) called Equation 22 the adversarial contrastive divergence (ACD). It underlies the work of Kim and Bengio (2016); Dai et al (2017).…”

Section: Adversarial Contrastive Divergence: Joint Learning Of Generamentioning

confidence: 99%

See 1 more Smart Citation

Representation Learning: A Statistical Perspective

Xie¹,

Gao

Nijkamp

et al. 2020

Annu. Rev. Stat. Appl.

Self Cite

View full text Add to dashboard Cite

Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations.

show abstract

Section: Adversarial Contrastive Divergence: Joint Learning Of Generamentioning

confidence: 99%

“…3. Π distribution: Π(h, x) = π α (x)q φ (h|x) Han et al (2019) proposed to learn the three models p θ , π α , and q φ by the following divergence triangle…”

Section: Divergence Triangle: Variational Auto-encoder Plus Adversarimentioning

confidence: 99%

Representation Learning: A Statistical Perspective

Xie¹,

Gao

Nijkamp

et al. 2020

Annu. Rev. Stat. Appl.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the descriptive model, we use the same structure as the inference net. [36]. The generative model seeks to get close to the data distribution as well as the descriptive model.…”

Section: 2mentioning

confidence: 99%

“…Following the notation of previous subsections, write P data (h, X) = P data (X)ρ φ (h|X), P (h, X) = p θ (X)ρ φ (h|X), and Q(h, X) = q(h)q α (X|h). It has been noticed by the recent work [36] that the variational objective KL(P data Q) and the adversarial objective KL(P data P )− KL(Q P ) can be combined into 12) which is in the form of a triangle formed by P data , P , and Q. See Figure 29 for an illustration.…”

Section: 2mentioning

confidence: 99%

See 1 more Smart Citation

A tale of three probabilistic families: Discriminative, descriptive, and generative models

Wu¹,

Gao²,

Han³

et al. 2018

Quart. Appl. Math.

Self Cite

View full text Add to dashboard Cite

where sign(r) = +1 if r ≥ 0, and sign(r) = −1 otherwise. Both the logistic regression and the perceptron can be generalized to the multi-category case. The bias term b can be absorbed into the weight parameters θ if we fix h i1 = 1. Let f (X) = h(X) θ. f (X) captures the relationship between X and Y . Because h(X) is non-linear, f (X) is also non-linear. We say the model is in the linear form because it is linear in θ, or f (X) is a linear combination of the features in h(X). The following are the choices of h() in various discriminative models.Kernel machine [12]: h i = h(X i ) is implicit, and the dimension of h i can potentially be infinite. The implementation of this method is based on the kernel trick h(X), h(X ) = K(X, X ), where K is a kernel that is explicitly used by the classifier such as the support vector machine [12]. f (X) = h(X) θ belongs to the reproducing kernel Hilbert space where the norm of f can be defined as the Euclidean norm of θ, and the norm is used to regularize the model. A Bayesian treatment leads to the Gaussian process, where θ is assumed to follow N(0, σ 2 I d ), and I d is the identity matrix of dimension d. f (X) is a Gaussian process with Cov(f (X), f (X )) = σ 2 K(X, X ).Boosting machine [22]: For h i = (h ik , k = 1, ..., d) , each h ik ∈ {+, −} is a weak classifier or a binary feature extracted from X, and f (X) = h(X) θ is a committee of weak classifiers.CART [6]: In the classification and regression trees, there are d rectangle regions {R k , k = 1, ..., d} resulted from recursive binary partition of the space of X, and each h ik = 1(X i ∈ R k ) is the binary indicator such that h ik = 1 if X i ∈ R k and h ik = 0 otherwise. f (X) = h(X) θ is a piecewise constant function.MARS [23]: In the multivariate adaptive regression splines, the components of h(X) are hinge functions such as max(0, x j − t) (where x j is the j-th component of X, j = 1, ..., p, and t is a threshold) and their products. It can be considered a continuous version of CART.Encoder and decoder: In the diagram in (2.1), the transformation X i → h i is called an encoder, and the transformation h i → Y i is called a decoder. In the non-hierarchical model, the encoder is designed, and only the decoder is learned.The outcome Y i can also be continuous or a high-dimensional vector. The learning then becomes a regression problem. Both classification and regression are about supervised learning because for each input X i , an output Y i is provided as supervision. The reinforcement learning is similar to supervised learning except that the guidance is in the form of a reward function.2.2. Descriptive models. This subsection describes the linear form of the descriptive models and the maximum likelihood learning algorithm.The descriptive models [113] can be learned in the unsupervised setting, where Y i are not observed, as illustrated by the table below:

show abstract

A Tale of Three Families: Discriminative, Descriptive, and Generative Models

Zhu

2023

Computer Vision

View full text Add to dashboard Cite

Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model

Cited by 45 publications

References 49 publications

Representation Learning: A Statistical Perspective

Representation Learning: A Statistical Perspective

A tale of three probabilistic families: Discriminative, descriptive, and generative models

A Tale of Three Families: Discriminative, Descriptive, and Generative Models

Contact Info

Product

Resources

About