1994
DOI: 10.1109/72.317727
|View full text |Cite
|
Sign up to set email alerts
|

Weight smoothing to improve network generalization

Abstract: A weight smoothing algorithm is proposed in this paper to improve a neural network's generalization capability. The algorithm can be used when the data patterns to be classified are presented on an n-dimensional grid (n>/=1) and there exists some correlations among neighboring data points within a pattern. For a fully-interconnected feedforward net, no such correlation information is embedded into the architecture. Consequently, the correlations can only be extracted through sufficient amount of network traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

1994
1994
2019
2019

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(14 citation statements)
references
References 18 publications
0
14
0
Order By: Relevance
“…This concept has been explored in a variety of formulations, e.g. by means of weight decay [21], weight smoothing [19], label smoothing [38], or penalizing the norm of the output derivative with respect to the network weights [14]. Of particular interest to our problem are methods that regularize by penalizing the norm of the Jacobian with respect to the input [28,39].…”
Section: Background and Previous Workmentioning
confidence: 99%
“…This concept has been explored in a variety of formulations, e.g. by means of weight decay [21], weight smoothing [19], label smoothing [38], or penalizing the norm of the output derivative with respect to the network weights [14]. Of particular interest to our problem are methods that regularize by penalizing the norm of the Jacobian with respect to the input [28,39].…”
Section: Background and Previous Workmentioning
confidence: 99%
“…When the m.s.e. decreases, the adaptive learning rate Á.n/ increases according to (13). The enlarged Á.n/ accelerates the decrement of the error.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…To avoid the local minima, our training algorithm is gradient descent with momentum [9] and adaptive learning rate [10] for finding a set of weights which minimizes the MSE. And the weights' initial values are between -1 and 1 at random.…”
Section: Training Algorithmmentioning
confidence: 99%