A Connection Between Score Matching and Denoising Autoencoders

Vincent, Pascal

doi:10.1162/neco_a_00142

Cited by 646 publications

(482 citation statements)

References 11 publications

Supporting

Mentioning

476

Contrasting

Unclassified

Order By: Relevance

“…In [16,34,8], advanced sampling methods for computing the negative part of the gradient based on tempering were proposed and shown to improve and stabilize learning. h [1] h [1] h [1] h [2] h [ A single-layer DAE is a special form of multi-layer perception network with a single hidden layer and a tied set of weights [45] (see Figure 3 (b)). A DAE is a network that reconstructs a corrupted input vector as well as possible by minimizing the following cost function…”

Section: Restricted Boltzmann Machines and Denoising Autoencodersmentioning

confidence: 99%

How to Pretrain Deep Boltzmann Machines in Two Stages

Cho

Raiko

Ilin

et al. 2015

Springer Series in Bio-/Neuroinformatics

View full text Add to dashboard Cite

A deep Boltzmann machine (DBM) is a recently introduced Markov random field model that has multiple layers of hidden units. It has been shown empirically that it is difficult to train a DBM with approximate maximum-likelihood learning using the stochastic gradient unlike its simpler special case, restricted Boltzmann machine (RBM). In this paper, we propose a novel pretraining algorithm that consists of two stages; obtaining approximate posterior distributions over hidden units from a simpler model and maximizing the variational lower-bound given the fixed hidden posterior distributions. We show empirically that the proposed method overcomes the difficulty in training DBMs from randomly initialized parameters and results in a better, or comparable, generative model when compared to the conventional pretraining algorithm.

show abstract

Section: Restricted Boltzmann Machines and Denoising Autoencodersmentioning

confidence: 99%

How to Pretrain Deep Boltzmann Machines in Two Stages

Cho

Raiko

Ilin

et al. 2015

Springer Series in Bio-/Neuroinformatics

View full text Add to dashboard Cite

show abstract

“…This method, which consists of alternately adding noise to a sample and denoising it, yields competitive performance in terms of estimated log-likelihood of the samples. An important connection was also made by Vincent [27], who showed that optimising the training objective of a denoising autoencoder is equivalent to performing score matching [17] between the Parzen density estimator of the training data and a particular energy-based model. Composite denoising autoencoders learn a diverse representation by leveraging the observation that the types of features learnt by the standard denoising autoencoders differ depending on the level of noise.…”

Section: Introductionmentioning

confidence: 99%

“…It can be useful to allow the transfer function h for the decoder to be different from that for the encoder. Typically, W and W are constrained by W = W T , which has been justified theoretically by Vincent [27].…”

Section: Introductionmentioning

confidence: 99%

Composite Denoising Autoencoders

Geras

Sutton

2016

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. In representation learning, it is often desirable to learn features at different levels of scale. For example, in image data, some edges will span only a few pixels, whereas others will span a large portion of the image. We introduce an unsupervised representation learning method called a composite denoising autoencoder (CDA) to address this. We exploit the observation from previous work that in a denoising autoencoder, training with lower levels of noise results in more specific, fine-grained features. In a CDA, different parts of the network are trained with different versions of the same input, corrupted at different noise levels. We introduce a novel cascaded training procedure which is designed to avoid types of bad solutions that are specific to CDAs. We show that CDAs learn effective representations on two different image data sets.

show abstract

“…Expression of a data is studied or initial data is coded effectively by hidden layer. According to the study of literatures [19][20][21][22][23][24][25][26][27][28][29], Deep Learning can gain more representative characteristic information by training large scale data. Thus, the sample may be classified and estimated to improve precision of information.…”

Section: Introductionmentioning

confidence: 99%

Semiactive Nonsmooth Control for Building Structure with Deep Learning

Wang

Huang

et al. 2017

Complexity

View full text Add to dashboard Cite

Aiming at suppressing harmful effect for building structure by surface motion, semiactive nonsmooth control algorithm with Deep Learning is proposed. By finite-time stable theory, the building structure closed-loop system's stability is discussed under the proposed control algorithm. It is found that the building structure closed-loop system is stable. Then the proposed control algorithm is applied on controlling the building structural vibration. The seismic action is chosen as El Centro seismic wave. Dynamic characteristics have comparative analysis between semiactive nonsmooth control and passive control in two simulation examples. They demonstrate that the designed control algorithm has great robustness and anti-interference. The proposed control algorithm is more effective than passive control in suppressing structural vibration.

show abstract

A Connection Between Score Matching and Denoising Autoencoders

Cited by 646 publications

References 11 publications

How to Pretrain Deep Boltzmann Machines in Two Stages

How to Pretrain Deep Boltzmann Machines in Two Stages

Composite Denoising Autoencoders

Semiactive Nonsmooth Control for Building Structure with Deep Learning

Contact Info

Product

Resources

About