Recent advances in computer technology allow the implementation of some important methods that were assigned lower priority in the past due to their computational burdens. Second-order backpropagation (BP) is such a method that computes the exact Hessian matrix of a given objective function. We describe two algorithms for feed-forward neural-network (NN) learning with emphasis on how to organize Hessian elements into a so-called stagewise-partitioned block-arrow matrix form:(1) stagewise BP, an extension of the discrete-time optimal-control stagewise Newton of Dreyfus 1966; and (2) nodewise BP, based on direct implementation of the chain rule for differentiation attributable to Bishop 1992. The former, a more systematic and cost-efficient implementation in both memory and operation, progresses in the same layer-by-layer (i.e., stagewise) fashion as the widely-employed first-order BP computes the gradient vector. We also show intriguing separable structures of each block in the partitioned Hessian, disclosing the rank of blocks.I. INTRODUCTION In multi-stage optimal control problems, second-order optimization procedures (see [8] and references therein) proceed in a stagewise manner since N, the number of stages, is often very large. Naturally, those methods can be employed for optimizing multi-stage feed-forward neural networks: In this paper, we focus on an N-layered multilayer perceptron (MLP), which gives rise to an N-stage decision making problem. At each stage s, we assume there are P8 (s = 1, *,N) states (or nodes) and n8 (s = 1, --, N-1) decision parameters (or weights), denoted by an n,-vector 08,8+1 (between layers s and s+1). No decisions are to be made at terminal stage N (or layer N); hence, the N-1 decision stages in total. To compute the gradient vector for optimization purposes, we employ the "first-order" backpropagation (BP) process [5], [6], [7], which consists of two major procedures: forward pass and backward pass [see later Eq. (2)]. A forward-pass situation in MLPlearning, where the node outputs in layer s -1 (denoted by yS-1) affect the node outputs in the next layer s (i.e., y8) via connection parameters (denoted by 081s, between those two layers), can be interpreted as a situation in optimal control where state y8-1 at stage s-1 is moved to state y8 at the next stage s by decisions 081,8. In the backward pass, sensitivities of the objective function E with respect to states (i.e., node sensitivities) are propagated from one stage back to another while computing gradients and Hessian elements. However, MLPs exhibit a great deal of structure, which turns out to be a very special case in optimal control; for instance, the "afternode" outputs (or states) are evaluated individually at each stage as yj = fj'(x^), where f(.) denotes a differentiable state-transition, function of nonlinear dynamics, and x1, the "before-node" net input to node j at layer s, depends on only a subset of all decisions taken at stage s-1. In spite of this distinction and others, using a vector of states as a basic ingr...