Second-order backpropagation algorithms for a stagewise-partitioned separable Hessian matrix

Mizutani, Eiji; Dreyfus, Stuart E.; Demmel, James

doi:10.1109/ijcnn.2005.1555994

Cited by 11 publications

(16 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Needless to say, the latter approach has no chance to exploit negative curvature. Fortunately, the local and global Hessian matrices can be evaluated efficiently by our recently-developed second-order stagewise backpropagation at the essentially same cost of the Gauss-Newton Hessian part in CANFIS neuro-fuzzy modular-network learning [19] (as well as in MLP-learning [9], [8]). …”

Section: Discussionmentioning

confidence: 99%

“…Then, exploiting negative curvature turns out to be effective in order to avoid locking onto so-called singular points. Here, our second-order stagewise backpropagation procedure [19], [9], [8] is an indispensable element that makes it very practical to find a descent direction of negative curvature, for it evaluate the entire Hessian H very efficiently (at the essentially same cost for J T J alone).…”

Section: Discussionmentioning

confidence: 99%

“…(6) in order to exploit sparsity; for more details, see [19], [8], [12], [18]. This is just a block-diagonal approximation to the global Hessian matrix H L .…”

Section: A the Local Hessian Matricesmentioning

confidence: 99%

“…Initially, the three fuzzy MFs are centered at −2 for x small, 0 for medium, and 2 for large, and the width parameter is initialized as 0.6 for Eq. (8).…”

Section: An Elimination Singularity In Neuro-fuzzy Learningmentioning

confidence: 99%

See 3 more Smart Citations

A block-approximate local Hessian-matrix analysis for CANFIS neuro-fuzzy modular network learning

Mizutani

2012

2012 IEEE International Conference on Fuzzy Systems

View full text Add to dashboard Cite

We describe how to alleviate the problem of plateau phenomena that may arise in learning with a CANFIS neuro-fuzzy modular network. The network model consists of multiple "local-expert" MLPs (multilayer perceptrons) mediated by fuzzy membership functions. Even with such a complex modular architecture, our recently-developed secondorder stagewise backpropagation procedure efficiently evaluates the Hessian matrix of a given objective function, for which we employ the sum-squared-error measure. For concreteness, we use a small curve-fitting problem that allows us to demonstrate detailed analysis based on the Hessian matrix. In particular, we describe how to use a block-diagonal approximate local Hessian matrix associated with a bottleneck local-expert MLP that does not perform very well for a designated task. Since the "bad" performance implies relatively large residuals (or errors), the local Hessian matrix tends to be indefinite; therefore, it is worth exploiting the negative curvature to escape from the plateaus.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

“…(6) in order to exploit sparsity; for more details, see [19], [8], [12], [18]. This is just a block-diagonal approximation to the global Hessian matrix H L .…”

Section: A the Local Hessian Matricesmentioning

confidence: 99%

“…Initially, the three fuzzy MFs are centered at −2 for x small, 0 for medium, and 2 for large, and the width parameter is initialized as 0.6 for Eq. (8).…”

Section: An Elimination Singularity In Neuro-fuzzy Learningmentioning

confidence: 99%

See 2 more Smart Citations

A block-approximate local Hessian-matrix analysis for CANFIS neuro-fuzzy modular network learning

Mizutani

2012

2012 IEEE International Conference on Fuzzy Systems

View full text Add to dashboard Cite

show abstract

“…In the best-known BP formulation due to Rumelhart et al [3], x s , the vector of "beforenode" net inputs [see Equation (6)], is treated as the state vector, whereas in optimal control, y s , the vector of "afternode" outputs, is chosen as the state vector. , two hidden nodes (P 2 = 2), and three terminal outputs (F ≡ P 3 = 3); hence, 13 parameters in total including threshold parameters: (a) The desired block-arrow Hessian matrix, whose arrowhead should point downward to the right (see [4], [5]), with F (= 3) diagonal blocks; (b) A Hessian matrix with a complex sparse pattern, which is hard to exploit, obtained by the NETLAB (MATLABbased software) (see mlphess.m at http://www.ncrg.aston.ac.uk/netlab/). For large-scale optimization, it is not recommendable to approximate the inverse of the Hessian because it always becomes dense.…”

Section: Introductionmentioning

confidence: 99%

On derivation of stagewise second-order backpropagation by invariant imbedding for multi-stage neural-network learning

Mizutani¹,

Dreyfus²

The 2006 IEEE International Joint Conference on Neural Network Proceedings

Self Cite

View full text Add to dashboard Cite

We present a simple, intuitive argument based on "invariant imbedding" in the spirit of dynamic programming to derive a stagewise second-order backpropagation (BP) algorithm. The method evaluates the Hessian matrix of a general objective function efficiently by exploiting the multistage structure embedded in a given neural-network model such as a multilayer perceptron (MLP). In consequence, for instance, our stagewise BP can compute the full Hessian matrix "faster" than the standard method that evaluates the GaussNewton Hessian matrix alone by rank updates in nonlinear least squares learning. Through our derivation, we also show how the procedure serves to develop advanced learning algorithms; in particular, we explain how the introduction of "stage costs" leads to alternative systematic implementations of multi-task learning and weight decay.

show abstract

Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature

Mizutani¹,

Dreyfus²

2008

Neural Networks

View full text Add to dashboard Cite

Second-order backpropagation algorithms for a stagewise-partitioned separable Hessian matrix

Cited by 11 publications

References 12 publications

A block-approximate local Hessian-matrix analysis for CANFIS neuro-fuzzy modular network learning

A block-approximate local Hessian-matrix analysis for CANFIS neuro-fuzzy modular network learning

On derivation of stagewise second-order backpropagation by invariant imbedding for multi-stage neural-network learning

Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature

Contact Info

Product

Resources

About