On complexity analysis of supervised MLP-learning for algorithmic comparisons

Mizutani, Eiji; Dreyfus, Stuart E.

doi:10.1109/ijcnn.2001.939044

Cited by 41 publications

(24 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where e x is the transcendental exponential function [Mizutani and Dreyfus 2001]. As can be seen from examination of Eq.…”

Section: Neural-network Predictorsmentioning

confidence: 86%

“…Furthermore, for some arbitrary real-valued, scalar variable x, we denote the number of flops required to execute the transcendental exponential function e x by T e and that required to execute the square root function x 1/2 by T s . Evaluating a transcendental exponential function e x can take considerably more processing cycles than a standard arithmetic operation and can account for a significantly large proportion of the total estimated execution time [Mizutani and Dreyfus 2001]. Fast approximations to the exponential function, such as that proposed by Cawley [2000], can thus be employed to reduce the execution time, if required.…”

Section: Theoretical Computational Analysismentioning

confidence: 99%

See 1 more Smart Citation

Multistep-ahead neural-network predictors for network traffic reduction in distributed interactive applications

McCoy

Ward

McLoone

et al. 2007

ACM Trans. Model. Comput. Simul.

View full text Add to dashboard Cite

Predictive contract mechanisms such as dead reckoning are widely employed to support scalable remote entity modeling in distributed interactive applications (DIAs). By employing a form of controlled inconsistency, a reduction in network traffic is achieved. However, by relying on the distribution of instantaneous derivative information, dead reckoning trades remote extrapolation accuracy for low computational complexity and ease-of-implementation. In this article, we present a novel extension of dead reckoning, termed neuro-reckoning, that seeks to replace the use of instantaneous velocity information with predictive velocity information in order to improve the accuracy of entity position extrapolation at remote hosts. Under our proposed neuro-reckoning approach, each controlling host employs a bank of neural network predictors trained to estimate future changes in entity velocity up to and including some maximum prediction horizon. The effect of each estimated change in velocity on the current entity position is simulated to produce an estimate for the likely position of the entity over some short time-span. Upon detecting an error threshold violation, the controlling host transmits a predictive velocity vector that extrapolates through the estimated position, as opposed to transmitting the instantaneous velocity vector. Such an approach succeeds in reducing the spatial error associated with remote extrapolation of entity state. Consequently, a further reduction in network traffic can be achieved. Simulation results conducted using several human users in a highly interactive DIA indicate significant potential for improved scalability when compared to the use of IEEE DIS standard dead reckoning. Our proposed neuro-reckoning framework exhibits low computational resource overhead for real-time use and can be seamlessly integrated into many existing dead reckoning mechanisms. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)

show abstract

“…where e x is the transcendental exponential function [Mizutani and Dreyfus 2001]. As can be seen from examination of Eq.…”

Section: Neural-network Predictorsmentioning

confidence: 86%

Section: Theoretical Computational Analysismentioning

confidence: 99%

Multistep-ahead neural-network predictors for network traffic reduction in distributed interactive applications

McCoy

Ward

McLoone

et al. 2007

ACM Trans. Model. Comput. Simul.

View full text Add to dashboard Cite

show abstract

“…5(a)] trained with a steepest descent-type online pattern-by-pattern mode learning (or an incremental gradient) algorithm in conjunction with backpropagation (BP) [23], [24], wherein (momentum term) 0.8, , a learning rate (or step size) for the parameters between the output and hidden layers, and for those between the hidden and input layers.…”

Section: Multiple-illuminant Experimentsmentioning

confidence: 99%

Multi-illuminant color reproduction for electronic cameras via CANFIS neuro-fuzzy modular network device characterization

Mizutani

Nishio

2002

IEEE Trans. Neural Netw.

Self Cite

View full text Add to dashboard Cite

We describe color reproduction and correction of images captured by electronic cameras under multiple illumination (or lighting) conditions, relating to color device characterization for enhancing the quality of color in the obtained images. In particular, we highlight a very practical use of neuro-fuzzy modular network coactive neuro-fuzzy inference systems (CANFIS) models for this application, and discuss their strengths and weaknesses compared with other adaptive network models (e.g., multilayer perceptron (MLP)) as well as conventional lookup-table-type (TRC-matrix) methods. Our in-depth investigation based on comprehensive numerical tests with a wide variety of illumination/lighting data (180 sources of illumination) shows that the "neuro-fuzzy CANFIS with MLP local experts" possesses a remarkable generalization/approximation capacity, even under a very restricted condition where only four-illuminant data sets were permitted to be used for optimization because of efficient practical implementation subject to an industrial setting.

show abstract

“…(25) and (26). Yet, the stagewise computation by first-order BP can be viewed in such a way that the gradients are efficiently computed (without forming such sparse block-diagonal matrices explicitly) by the outer product 6'+1y T, which produces a P8+l-by-(1 + P8) matrix G88'+l of gradients [7] associated with the same-sized matrix 88',8+± of parameters; here, column i of G',8+l is given as a P,+,-vector g',S+l for 0,s+1. Again, the resulting gradient matrix G-,8+l can be reshaped to an n8-length gradient vector g',8+l in the same manner as shown in Eq.…”

Section: Stagewise First-order Backpropagationmentioning

confidence: 99%

“…No decisions are to be made at terminal stage N (or layer N); hence, the N-1 decision stages in total. To compute the gradient vector for optimization purposes, we employ the "first-order" backpropagation (BP) process [5], [6], [7], which consists of two major procedures: forward pass and backward pass [see later Eq. (2)].…”

mentioning

confidence: 99%

Second-order backpropagation algorithms for a stagewise-partitioned separable Hessian matrix

Mizutani

Dreyfus²,

Demmel³

Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.

Self Cite

View full text Add to dashboard Cite

Recent advances in computer technology allow the implementation of some important methods that were assigned lower priority in the past due to their computational burdens. Second-order backpropagation (BP) is such a method that computes the exact Hessian matrix of a given objective function. We describe two algorithms for feed-forward neural-network (NN) learning with emphasis on how to organize Hessian elements into a so-called stagewise-partitioned block-arrow matrix form:(1) stagewise BP, an extension of the discrete-time optimal-control stagewise Newton of Dreyfus 1966; and (2) nodewise BP, based on direct implementation of the chain rule for differentiation attributable to Bishop 1992. The former, a more systematic and cost-efficient implementation in both memory and operation, progresses in the same layer-by-layer (i.e., stagewise) fashion as the widely-employed first-order BP computes the gradient vector. We also show intriguing separable structures of each block in the partitioned Hessian, disclosing the rank of blocks.I. INTRODUCTION In multi-stage optimal control problems, second-order optimization procedures (see [8] and references therein) proceed in a stagewise manner since N, the number of stages, is often very large. Naturally, those methods can be employed for optimizing multi-stage feed-forward neural networks: In this paper, we focus on an N-layered multilayer perceptron (MLP), which gives rise to an N-stage decision making problem. At each stage s, we assume there are P8 (s = 1, *,N) states (or nodes) and n8 (s = 1, --, N-1) decision parameters (or weights), denoted by an n,-vector 08,8+1 (between layers s and s+1). No decisions are to be made at terminal stage N (or layer N); hence, the N-1 decision stages in total. To compute the gradient vector for optimization purposes, we employ the "first-order" backpropagation (BP) process [5], [6], [7], which consists of two major procedures: forward pass and backward pass [see later Eq. (2)]. A forward-pass situation in MLPlearning, where the node outputs in layer s -1 (denoted by yS-1) affect the node outputs in the next layer s (i.e., y8) via connection parameters (denoted by 081s, between those two layers), can be interpreted as a situation in optimal control where state y8-1 at stage s-1 is moved to state y8 at the next stage s by decisions 081,8. In the backward pass, sensitivities of the objective function E with respect to states (i.e., node sensitivities) are propagated from one stage back to another while computing gradients and Hessian elements. However, MLPs exhibit a great deal of structure, which turns out to be a very special case in optimal control; for instance, the "afternode" outputs (or states) are evaluated individually at each stage as yj = fj'(x^), where f(.) denotes a differentiable state-transition, function of nonlinear dynamics, and x1, the "before-node" net input to node j at layer s, depends on only a subset of all decisions taken at stage s-1. In spite of this distinction and others, using a vector of states as a basic ingr...

show abstract

On complexity analysis of supervised MLP-learning for algorithmic comparisons

Cited by 41 publications

References 6 publications

Multistep-ahead neural-network predictors for network traffic reduction in distributed interactive applications

Multistep-ahead neural-network predictors for network traffic reduction in distributed interactive applications

Multi-illuminant color reproduction for electronic cameras via CANFIS neuro-fuzzy modular network device characterization

Second-order backpropagation algorithms for a stagewise-partitioned separable Hessian matrix

Contact Info

Product

Resources

About