1988
DOI: 10.1007/bf00332918
|View full text |Cite
|
Sign up to set email alerts
|

Auto-association by multilayer perceptrons and singular value decomposition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
633
0
5

Year Published

1999
1999
2021
2021

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,175 publications
(640 citation statements)
references
References 6 publications
2
633
0
5
Order By: Relevance
“…A hidden layer in each part enables the network to perform nonlinear mapping functions. Without these hidden layers, the network would only be able to perform linear PCA even with nonlinear units in the component layer, as shown by Bourlard and Kamp [29]. To regularise the network, a weight decay term is added E total = E + ν i w 2 i in order to penalise large network weights w. In most experiments, ν = 0.001 was a reasonable choice.…”
Section: Standard Nonlinear Pcamentioning
confidence: 99%
“…A hidden layer in each part enables the network to perform nonlinear mapping functions. Without these hidden layers, the network would only be able to perform linear PCA even with nonlinear units in the component layer, as shown by Bourlard and Kamp [29]. To regularise the network, a weight decay term is added E total = E + ν i w 2 i in order to penalise large network weights w. In most experiments, ν = 0.001 was a reasonable choice.…”
Section: Standard Nonlinear Pcamentioning
confidence: 99%
“…Standard optimization methods, such as the gradient descent algorithm, can be used for estimating the network weights. It has been shown that a three-layer autoencoder with a bottleneck in the second layer performs a linear dimensionality reduction specified by the principal components of the input distribution (Bourlard & Kamp, 1988;Baldi & Hornik, 1989;Oja, 1991). Specifically, if an autoencoder containing M hidden units is trained by minimizing the error function (17), the network projects the data onto the M-dimensional sub-space spanned by the first M principal components of the data.…”
Section: Linear and Non-linear Autoencoders For Dimensionality Reductionmentioning
confidence: 99%
“…Interestingly, such a feature-generating system already exists and in fact has become quite popular through the rise of deep learning (Bengio et al, 2007;Hinton et al, 2006;Le et al, 2012;Marc'Aurelio et al, 2007): the autoencoder (Hinton and Zemel, 1994;Bourlard and Kamp, 1988). Autoencoders, which can be trained through a variety of algorithms from Restricted Boltzmann Machines (RBMs) (Hinton et al, 2006) to more conventional stochastic gradient descent (Le et al, 2012) (e.g.…”
Section: Introductionmentioning
confidence: 99%