1998
DOI: 10.1142/s0218488598000100
|View full text |Cite
|
Sign up to set email alerts
|

Speeding Up Backpropagation Algorithms by Using Cross-Entropy Combined with Pattern Normalization

Abstract: This paper demonstrates how the backpropagation algorithm (BP) and its variants can be accelerated significantly while the quality of the trained nets will increase. Two modifications were proposed: First, instead of the usual quadratic error we use the cross entropy as an error function and second, we normalize the input patterns. The first modification eliminates the so called sigmoid prime factor of the update rule for the output units. In order to balance the dynamic range of the inputs we use normalizatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2003
2003
2006
2006

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(24 citation statements)
references
References 0 publications
0
22
0
Order By: Relevance
“…The error percentage [12] deÿned as 100 · E MSE (w)=(t max − t min ) 2 , where t max and t min denote the maximum and minimum target value, respectively, is used to display the results in this section. However, cross-entropy based error functions may be more suitable for classiÿcation tasks and work also well in combination with Rprop [8].…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…The error percentage [12] deÿned as 100 · E MSE (w)=(t max − t min ) 2 , where t max and t min denote the maximum and minimum target value, respectively, is used to display the results in this section. However, cross-entropy based error functions may be more suitable for classiÿcation tasks and work also well in combination with Rprop [8].…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…for a sigmoid squashing function, and can be removed on output nodes when using crossentropy (Joost & Schiffmann, 1998 ). To illustrate how CB training works, consider a three-class problem.…”
Section: Cb1 Error Functionmentioning
confidence: 99%
“…(12), at each node is multiplied by the sigmoid prime factor [15], G 0 ðx ðlÞ j ; pÞ; it is a nullity ifx ðlÞ j is outside the input active region. The network with tanh activation function encounters a very similar problem because this factor becomes small whenx ðlÞ j is large.…”
Section: The P-recursive Piecewise Polynomial Networkmentioning
confidence: 99%