Speeding Up Backpropagation Algorithms by Using Cross-Entropy Combined with  Pattern Normalization

Joost, Merten; Schiffmann, Wolfram

doi:10.1142/s0218488598000100

Cited by 21 publications

(24 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The error percentage [12] deÿned as 100 · E MSE (w)=(t max − t min ) 2 , where t max and t min denote the maximum and minimum target value, respectively, is used to display the results in this section. However, cross-entropy based error functions may be more suitable for classiÿcation tasks and work also well in combination with Rprop [8].…”

Section: Experimental Evaluationmentioning

confidence: 99%

Empirical evaluation of the improved Rprop learning algorithms

2003

View full text Add to dashboard Cite

The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing ÿrst-order learning methods for neural networks. We discuss modiÿcations of this algorithm that improve its learning speed. The new optimization methods are empirically compared to the existing Rprop variants, the conjugate gradient method, Quickprop, and the BFGS algorithm on a set of neural network benchmark problems. The improved Rprop outperforms the other methods; only the BFGS performs better in the later stages of learning on some of the test problems. For the analysis of the local search behavior, we compare the Rprop algorithms on general hyperparabolic error landscapes, where the new variants conÿrm their improvement.

show abstract

Section: Experimental Evaluationmentioning

confidence: 99%

Empirical evaluation of the improved Rprop learning algorithms

2003

View full text Add to dashboard Cite

show abstract

“…for a sigmoid squashing function, and can be removed on output nodes when using crossentropy (Joost & Schiffmann, 1998 ). To illustrate how CB training works, consider a three-class problem.…”

Section: Cb1 Error Functionmentioning

confidence: 99%

Classification-based objective functions

Rimer

Martinez

2006

Mach Learn

View full text Add to dashboard Cite

Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an approach to training artificial neural networks on classification problems. Classification-based learning attempts to guide the network directly to correct pattern classification rather than using common error minimization heuristics, such as sum-squared error (SSE) and cross-entropy (CE), that do not explicitly minimize classification error. CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86% and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7% and 91.3% using optimized SSE and CE networks, respectively, to 92.1% for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.

show abstract

“…(12), at each node is multiplied by the sigmoid prime factor [15], G 0 ðx ðlÞ j ; pÞ; it is a nullity ifx ðlÞ j is outside the input active region. The network with tanh activation function encounters a very similar problem because this factor becomes small whenx ðlÞ j is large.…”

Section: The P-recursive Piecewise Polynomial Networkmentioning

confidence: 99%

The p-recursive piecewise polynomial sigmoid generators and first-order algorithms for multilayer tanh-like neurons

Sunat

Lursinsap

Chu

2006

Neural Comput & Applic

View full text Add to dashboard Cite

This paper demonstrates how the p-recursive piecewise polynomial (p-RPP) generators and their derivatives are constructed. The feedforward computational time of a multilayer feedforward network can be reduced by using these functions as the activation functions. Three modifications of training algorithms are proposed. First, we use the modified error function so that the sigmoid prime factor for the updating rule of the output units is eliminated. Second, we normalize the input patterns in order to balance the dynamic range of the inputs. And third, we add a new penalty function to the hidden layer to get the anti-Hebbian rules in providing information when the activation functions have zero sigmoid prime factor. The three modifications are combined with two versions of Rprop (Resilient propagation) algorithm. The proposed procedures achieved excellent results without the need for careful selection of the training parameters. Not only the algorithm but also the shape of the activation function has important influence on the training performance.

show abstract

Speeding Up Backpropagation Algorithms by Using Cross-Entropy Combined with Pattern Normalization

Cited by 21 publications

References 0 publications

Empirical evaluation of the improved Rprop learning algorithms

Empirical evaluation of the improved Rprop learning algorithms

Classification-based objective functions

The p-recursive piecewise polynomial sigmoid generators and first-order algorithms for multilayer tanh-like neurons

Contact Info

Product

Resources

About