Understanding the Effectiveness of Lipschitz-Continuity in Generative Adversarial Nets

Zhou, Zhiming; Song, Yuxuan; Yu, Lantao; Wang, Hongwei; Liang, Jiadong; Zhang, Weinan; Zhang, Zhihua; Yu, Yong

doi:10.48550/arxiv.1807.00751

Cited by 5 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, a typical strategy is to enforce K-Lipschitz constraint. Yet, in context of DL it is still unclear if and how it is possible to enforce a model to be exact K-Lipschitz, even though there are several recently proposed techniques for this (Gulrajani et al, 2017;Petzka et al, 2017;Miyato et al, 2018;Zhou et al, 2018).…”

Section: Pso With Unit Magnitudesmentioning

confidence: 99%

General Probabilistic Surface Optimization and Log Density Estimation

Kopitkov,

Indelman

2019

Preprint

View full text Add to dashboard Cite

Probabilistic inference, such as density estimation and distribution transformation, is a fundamental and highly important problem that needs to be solved in many different domains. Recently, a lot of research was done to solve it using Deep Learning (DL) approaches, including unnormalized and energy models, as well as Generative Adversarial Networks, where DL has shown top approximation performance. In this paper we contribute a novel algorithm family, which generalizes all above, and allows to infer different statistical modalities (e.g. data likelihood and ratio between densities) from data samples. The proposed unsupervised technique, named Probabilistic Surface Optimization (PSO), views a neural network (NN) as a flexible surface which can be pushed according to loss-specific virtual stochastic forces, where a dynamical equilibrium is achieved when the point-wise forces on the surface become equal. Concretely, the surface is pushed up and down at points sampled from two different distributions, with overall up and down forces becoming functions of these two distribution densities and of force intensity magnitudes defined by loss of a particular PSO instance. The eventual force equilibrium upon convergence enforces the NN to be equal to various statistical functions depending on the used magnitude functions, such as data density. Furthermore, this dynamical-statistical equilibrium is extremely intuitive and useful, providing many implications and possible usages in probabilistic inference. Further, we connect PSO to numerous existing statistical works which are also PSO instances, and derive new PSO-based inference methods as demonstration of PSO exceptional usability. Likewise, based on the insights coming from the virtual-force perspective we analyze PSO stability and propose new ways to improve it. Finally, we present new instances of PSO, termed PSO-LDE, for data density estimation on logarithmic scale and also provide a new NN block-diagonal architecture for increased surface flexibility, which significantly improves estimation accuracy. Both PSO-LDE and the new architecture are combined together as a new density estimation technique. In our experiments we demonstrate this technique to be superior over state-of-the-art baselines in density estimation task for multi-modal 20D data.

show abstract

Section: Pso With Unit Magnitudesmentioning

confidence: 99%

General Probabilistic Surface Optimization and Log Density Estimation

Kopitkov,

Indelman

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…We test the effects of training stabilization brought by the SsGAN. We consider two types of hyper-parameter settings: First, controlling the Lipschitz constant of the discriminator, which is a central quantity analyzed in the GAN literature [12,32]. We consider two state-of-theart techniques: Gradient Penalty [11], and Spectral Normalization [12].…”

Section: Robustness Testmentioning

confidence: 99%

Self-Supervised GAN to Counter Forgetting

Chen,

Zhai,

Houlsby

2018

Preprint

View full text Add to dashboard Cite

GANs involve training two networks in an adversarial game, where each network's task depends on its adversary. Recently, several works have framed GAN training as an online or continual learning problem [1][2][3][4][5][6]. We focus on the discriminator, which must perform classification under an (adversarially) shifting data distribution. When trained on sequential tasks, neural networks exhibit forgetting. For GANs, discriminator forgetting leads to training instability [1]. To counter forgetting, we encourage the discriminator to maintain useful representations by adding a self-supervision. Conditional GANs have a similar effect using labels. However, our self-supervised GAN does not require labels, and closes the performance gap between conditional and unconditional models. We show that, in doing so, the self-supervised discriminator learns better representations than regular GANs. 1 .

show abstract

“…The proposition shows that the optimal f * provides informative gradient [57] from q towards p r . We then generalize the conclusion to p θ by considering correlation between q and p θ .…”

Section: Theoretical Analysismentioning

confidence: 92%

“…Research on the Lipschitz continuity of GAN discriminators have resulted in the theory of "informative gradients" [56,57]. Under certain mild conditions, a Lipschitz discriminator can provide informative gradient to the generator in a GAN framework: when p θ and p r are disjoint, the gradient ∇f * (x) of optimal discriminator f * w.r.t each sample x ∼ p θ points to a sample x * ∼ p r , which guarantees that the generation distribution p θ is moving towards p r .…”

Section: Related Workmentioning

confidence: 99%

“…Although [25] shows preliminary connections between PR and GAN, the proposed PR framework does not provide informative gradient to the generator when treated as a GAN loss. Following [57], we consider the training problem when the discriminator (i.e. f φ (x) here) is optimal: when discriminator f * φ (x) is optimal, then the gradient of generator g(…”

Section: Appendixmentioning

confidence: 99%

See 1 more Smart Citation

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

Wu¹,

Zhou²,

Wilson³

et al. 2020

Preprint

View full text Add to dashboard Cite

Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) can suffer from inferior performance due to unstable training, especially for text generation. We propose a new variational GAN training framework which enjoys superior training stability. Our approach is inspired by a connection of GANs and reinforcement learning under a variational perspective. The connection leads to (1) probability ratio clipping that regularizes generator training to prevent excessively large updates, and (2) a sample re-weighting mechanism that stabilizes discriminator training by downplaying bad-quality fake samples. We provide theoretical analysis on the convergence of our approach. By plugging the training approach in diverse state-of-the-art GAN architectures, we obtain significantly improved performance over a range of tasks, including text generation, text style transfer, and image generation 1 .

show abstract

Understanding the Effectiveness of Lipschitz-Continuity in Generative Adversarial Nets

Cited by 5 publications

References 23 publications

General Probabilistic Surface Optimization and Log Density Estimation

General Probabilistic Surface Optimization and Log Density Estimation

Self-Supervised GAN to Counter Forgetting

Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

Contact Info

Product

Resources

About