2012
DOI: 10.1007/978-3-642-35289-8_27
|View full text |Cite
|
Sign up to set email alerts
|

Training Deep and Recurrent Networks with Hessian-Free Optimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
490
0
5

Year Published

2015
2015
2022
2022

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 388 publications
(497 citation statements)
references
References 29 publications
2
490
0
5
Order By: Relevance
“…Also in 2011 it was shown (Martens and Sutskever, 2011) that Hessian-free optimization (e.g., Møller, 1993;Pearlmutter, 1994;Schraudolph, 2002) (Sec. 5.6.2) can alleviate the Fundamental Deep Learning Problem (Sec.…”
Section: : Hessian-free Optimization For Rnnsmentioning
confidence: 99%
“…Also in 2011 it was shown (Martens and Sutskever, 2011) that Hessian-free optimization (e.g., Møller, 1993;Pearlmutter, 1994;Schraudolph, 2002) (Sec. 5.6.2) can alleviate the Fundamental Deep Learning Problem (Sec.…”
Section: : Hessian-free Optimization For Rnnsmentioning
confidence: 99%
“…While HF, like all truncated-Newton methods, takes steps computed using partially converged calls to CG, it is naturally accelerated along at least some directions of lower curvature compared to the gradient. It can even be shown (Martens & Sutskever, 2012) that CG will tend to favor convergence to the exact solution to the quadratic sub-problem first along higher curvature directions (with a bias towards those which are more clustered together in their curvature-scalars/eigenvalues).…”
Section: Momentum and Hfmentioning
confidence: 99%
“…In order to analyze continuous time series of network data with highly complex structure, the RNN-GBRBM (modified RNN-RBM) is adopted. Combining the desirable characteristics of RNNs and RBMs have proven to be non-trivial [16] because RNN enables the network to have a simple version of memory with very minimal overhead and allows more freedom to describe the temporal dependencies involved [17], as well as because RBM can capture complicated, high-order correlations between the activities of hidden features [18] and provide a closed-form representation of the distribution underlying the observations [10]. Moreover, a semi-supervised incremental updating algorithm, which is appropriate for training the decoder and updating the parameter of classifier, is proposed.…”
Section: System Architecturementioning
confidence: 99%
“…s is defined as Eq. (17). Where x t is the t-th network data in a network data sequence, d decoded is the dimensionality of the decoded feature vector s (x t , Δt), s k (x t ) is a binary value which indicates the k-th code of decoded features, and Δt is the number of the hidden units in RNN which indicates s is encoded with Δt decoded features h (x t ), h (x t+1 ), .…”
Section: Rnn-gbrbmmentioning
confidence: 99%