1999
DOI: 10.1162/089976699300016223
|View full text |Cite
|
Sign up to set email alerts
|

Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods

Abstract: This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
70
0
5

Year Published

2001
2001
2011
2011

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 120 publications
(77 citation statements)
references
References 23 publications
2
70
0
5
Order By: Relevance
“…For this study, we used the implementations in the Xfuzzy environment, see [39] for a more detailed description of the wide range of methods supported. Among them, we distinguish four classes of methods: gradient descent [32], conjugate gradient, second order or quasi-Newton [3], and algorithms with no derivatives. Table 5 shows the test errors for the best option from each of the first three classes of algorithms: Resilient Propagation (Rprop) [42,32], from the gradient descent class, Scaled Conjugated Gradient (SCG) [35], from the conjugate gradient class, and Levenberg-Marquardt (L-M) [3], from the second order class of methods.…”
Section: B Comparison Of Different Neuro-fuzzy Methodsmentioning
confidence: 99%
“…For this study, we used the implementations in the Xfuzzy environment, see [39] for a more detailed description of the wide range of methods supported. Among them, we distinguish four classes of methods: gradient descent [32], conjugate gradient, second order or quasi-Newton [3], and algorithms with no derivatives. Table 5 shows the test errors for the best option from each of the first three classes of algorithms: Resilient Propagation (Rprop) [42,32], from the gradient descent class, Scaled Conjugated Gradient (SCG) [35], from the conjugate gradient class, and Levenberg-Marquardt (L-M) [3], from the second order class of methods.…”
Section: B Comparison Of Different Neuro-fuzzy Methodsmentioning
confidence: 99%
“…It is based on the idea of function comparison methods (Scales, 1985) taking into account E(t À 1) < E(t), and exploits the signs of the gradient values. The parameter q is a reduction factor that is used to update the midpoint of the considered interval; choice of q has an influence on the number of error function evaluations required to obtain an acceptable weight vector (Magoulas et al, 1999 …”
Section: Implementation Of the Jrpropmentioning
confidence: 99%
“…Adaptive gradient-based algorithms with individual step-sizes try to overcome the inherent difficulty of choosing the right learning rates for each region of the search space depending on the application (Magoulas et al, 1997(Magoulas et al, , 1999. This is done by controlling the weight update of each weight in order to minimize oscillations and maximize the length of the step-size.…”
Section: Introductionmentioning
confidence: 99%
“…Many researches are executed for increasing the convergence speed of EBP algorithm in MLP Neural Network see [2,3,8,7,13,12,14]. In [2,3] Abid and Fnaiech summarize the approaches for increasing the convergence speed of EBP onto seven cases including the weight updating procedure, the optimization criterion choice, the use of adaptive parameters, estimation of optimal initial conditions, pre-processing the problem before using MLP, optimization of MLP structure and the use of more advanced algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper we concentrate on the dynamic learning rate of the learning approach in order to update the weights of the networks similar to what was implemented in [7,8,12]. Thus we implemented a Variable Step Size (VSS) method to increase the convergence acceleration of the algorithm by reducing the learning epochs.…”
Section: Introductionmentioning
confidence: 99%