2007
DOI: 10.1007/s00365-006-0663-2
|View full text |Cite
|
Sign up to set email alerts
|

On Early Stopping in Gradient Descent Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

9
534
0
2

Year Published

2010
2010
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 794 publications
(545 citation statements)
references
References 46 publications
9
534
0
2
Order By: Relevance
“…Starting from the seminal works of Tikhonov and Phillips (Phillips, 1962;Tikhonov & Arsenin, 1977), a number of regularization methods have been proposed in the literature to solve the deconvolution problem, e.g. truncated singular value decompositions (Hansen, 1987) and gradient-based techniques (Hanke, 1995;Nemirovskii, 1986;Yao, Rosasco, & Caponnetto, 2007 regularization should be an important topic and area for system identification.…”
mentioning
confidence: 99%
“…Starting from the seminal works of Tikhonov and Phillips (Phillips, 1962;Tikhonov & Arsenin, 1977), a number of regularization methods have been proposed in the literature to solve the deconvolution problem, e.g. truncated singular value decompositions (Hansen, 1987) and gradient-based techniques (Hanke, 1995;Nemirovskii, 1986;Yao, Rosasco, & Caponnetto, 2007 regularization should be an important topic and area for system identification.…”
mentioning
confidence: 99%
“…The early stopping scheme suggests stopping of training at the point (epoch) from which onward the cost function value computed on cross-validation set starts to rise [46,[58][59][60]. Similarly, adding noise (jitters) into the training pattern improves FNN's generalization ability and removing insignificant weights from a trained FNN improves its fault tolerance ability [61].…”
Section: Generalizationmentioning
confidence: 99%
“…We have trained 155 DBNs using the standard configuration used by others [3]: 784 input nodes, 500 nodes in the first hidden layer, 1000 nodes in the second hidden layer, and 10 target nodes. We have used the benchmark MNIST data set with 20, 000 training examples, 10, 000 held-out examples used for early stopping validation [39], and 10, 000 test examples. Figure 3 shows for the MNIST data, as done for the DNA promoter data, a comparison between the test-set accuracy of the DBNs (model accuracy) and the test-set accuracy of the extracted rules.…”
Section: Algorithm 4 Dbn Extractmentioning
confidence: 99%