Convergence of Gradient Method With Momentum for Two-Layer Feedforward Neural Networks

Zhang, Naimin; Wu, Wei; Zheng, Guojun

doi:10.1109/tnn.2005.863460

Cited by 68 publications

(33 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notice that (24) represents a stochastic counterpart of (19).) By virtue of the well-known Doob's theorem [25], the property (24) yields …”

Section: Preliminariesmentioning

confidence: 99%

“…Unfortunately, this gives that the learning goes faster in the beginning and slows down in the late stage. The convergence analysis of learning algorithm with deterministic (non-stochastic) nature has been given in [17][18][19][20][21][22]. In contrast to the stochastic approach, several of these results allow to employ a constant learning rate [19,23].…”

Section: Introductionmentioning

confidence: 99%

“…The convergence analysis of learning algorithm with deterministic (non-stochastic) nature has been given in [17][18][19][20][21][22]. In contrast to the stochastic approach, several of these results allow to employ a constant learning rate [19,23]. However, they assume that learning set must be finite whereas in online identification schemes, this set is theoretically infinite.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Asymptotic Behaviour of Gradient Learning Algorithms in Neural Network Models for the Identification of Nonlinear Systems

Azarskov¹,

Kucherov²,

Nikolaienko³

et al. 2015

AJNNA

View full text Add to dashboard Cite

This paper deals with studying the asymptotical properties of multilayer neural networks models used for the adaptive identification of wide class of nonlinearly parameterized systems in stochastic environment. To adjust the neural network's weights, the standard online gradient type learning algorithms are employed. The learning set is assumed to be infinite but bounded. The Lyapunov-like tool is utilized to analyze the ultimate behaviour of learning processes in the presence of stochastic input variables. New sufficient conditions guaranteeing the global convergence of these algorithms in the stochastic frameworks are derived. The main their feature is that they need no a penalty term to achieve the boundedness of weight sequence. To demonstrate asymptotic behaviour of the learning algorithms and support the theoretical studies, some simulation examples are also given.

show abstract

“…Notice that (24) represents a stochastic counterpart of (19).) By virtue of the well-known Doob's theorem [25], the property (24) yields …”

Section: Preliminariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Asymptotic Behaviour of Gradient Learning Algorithms in Neural Network Models for the Identification of Nonlinear Systems

Azarskov¹,

Kucherov²,

Nikolaienko³

et al. 2015

AJNNA

View full text Add to dashboard Cite

show abstract

“…Once the error gradient is derived, numerous optimization algorithms for minimizing can be applied to train PyraNet [20]- [22]. In this paper, we focus on five representative training algorithms, namely gradient descent (GD) [23], gradient descent with momentum and variable learning rate (GDMV) [24], resilient backpropagation (RPROP) [25], conjugate gradient (CG) [20], and Levenberg-Marquardt (LM) [26].…”

Section: B Pyranet Training Algorithmsmentioning

confidence: 99%

A Pyramidal Neural Network For Visual Pattern Recognition

Phung

Bouzerdoum

2007

IEEE Trans. Neural Netw.

110

View full text Add to dashboard Cite

In this paper, we propose a new neural architecture for classification of visual patterns that is motivated by the two concepts of image pyramids and local receptive fields. The new architecture, called pyramidal neural network (PyraNet), has a hierarchical structure with two types of processing layers: Pyramidal layers and onedimensional (1-D) layers. In the new network, nonlinear two-dimensional (2-D) neurons are trained to perform both image feature extraction and dimensionality reduction. We present and analyze five training methods for PyraNet [gradient descent (GD), gradient descent with momentum, resilient backpropagation (RPROP), Polak-Ribiere conjugate gradient (CG), and Levenberg-Marquadrt (LM)] and two choices of error functions [mean-square-error (mse) and cross-entropy (CE)]. In this paper, we apply PyraNet to determine gender from a facial image, and compare its performance on the standard facial recognition technology (FERET) database with three classifiers: The convolutional neural network (NN), the k-nearest neighbor (k-NN), and the support vector machine (SVM). Disciplines Physical Sciences and Mathematics

show abstract

“…These techniques include such idea as varying the learning rate, using momentum and gain tuning of activation function. In [16] some convergence results are given where the learning fashion of training examples is batch learning. These results are of global nature in that they are valid for any arbitrarily given initial value of weights.…”

mentioning

confidence: 99%

The Effect of Adaptive Gain and Adaptive Momentum in Improving Training Time of Gradient Descent Back Propagation Algorithm on Classification Problems

Hamid

Nawi

Ghazali

2011

International Journal on Advanced Science, Engineering and Information Technology

View full text Add to dashboard Cite

Abstract-The back propagation algorithm has been successfully applied to wide range of practical problems. Since this algorithm uses a gradient descent method, it has some limitations which are slow learning convergence velocity and easy convergence to local minima. The convergence behaviour of the back propagation algorithm depends on the choice of initial weights and biases, network topology, learning rate, momentum, activation function and value for the gain in the activation function. Previous researchers demonstrated that in 'feed forward' algorithm, the slope of the activation function is directly influenced by a parameter referred to as 'gain'. This research proposed an algorithm for improving the performance of the current working back propagation algorithm which is Gradien Descent Method with Adaptive Gain by changing the momentum coefficient adaptively for each node. The influence of the adaptive momentum together with adaptive gain on the learning ability of a neural network is analysed. Multilayer feed forward neural networks have been assessed. Physical interpretation of the relationship between the momentum value, the learning rate and weight values is given. The efficiency of the proposed algorithm is compared with conventional Gradient Descent Method and current Gradient Descent Method with Adaptive Gain was verified by means of simulation on three benchmark problems. In learning the patterns, the simulations result demonstrate that the proposed algorithm converged faster on Wisconsin breast cancer with an improvement ratio of nearly 1.8, 6.6 on Mushroom problem and 36% better on Soybean data sets. The results clearly show that the proposed algorithm significantly improves the learning speed of the current gradient descent back-propagatin algorithm.

show abstract

Convergence of Gradient Method With Momentum for Two-Layer Feedforward Neural Networks

Cited by 68 publications

References 11 publications

Asymptotic Behaviour of Gradient Learning Algorithms in Neural Network Models for the Identification of Nonlinear Systems

Asymptotic Behaviour of Gradient Learning Algorithms in Neural Network Models for the Identification of Nonlinear Systems

A Pyramidal Neural Network For Visual Pattern Recognition

The Effect of Adaptive Gain and Adaptive Momentum in Improving Training Time of Gradient Descent Back Propagation Algorithm on Classification Problems

Contact Info

Product

Resources

About