On Early Stopping in Gradient Descent Learning

Yao, Yuan; Rosasco, Lorenzo; Caponnetto, Andrea

doi:10.1007/s00365-006-0663-2

Cited by 794 publications

(545 citation statements)

References 46 publications

Supporting

Mentioning

534

Contrasting

Unclassified

Order By: Relevance

“…Starting from the seminal works of Tikhonov and Phillips (Phillips, 1962;Tikhonov & Arsenin, 1977), a number of regularization methods have been proposed in the literature to solve the deconvolution problem, e.g. truncated singular value decompositions (Hansen, 1987) and gradient-based techniques (Hanke, 1995;Nemirovskii, 1986;Yao, Rosasco, & Caponnetto, 2007 regularization should be an important topic and area for system identification.…”

mentioning

confidence: 99%

Kernel methods in system identification, machine learning and function estimation: A survey

et al. 2014

View full text Add to dashboard Cite

a b s t r a c tMost of the currently used techniques for linear system identification are based on classical estimation paradigms coming from mathematical statistics. In particular, maximum likelihood and prediction error methods represent the mainstream approaches to identification of linear dynamic systems, with a long history of theoretical and algorithmic contributions. Parallel to this, in the machine learning community alternative techniques have been developed. Until recently, there has been little contact between these two worlds. The first aim of this survey is to make accessible to the control community the key mathematical tools and concepts as well as the computational aspects underpinning these learning techniques. In particular, we focus on kernel-based regularization and its connections with reproducing kernel Hilbert spaces and Bayesian estimation of Gaussian processes. The second aim is to demonstrate that learning techniques tailored to the specific features of dynamic systems may outperform conventional parametric approaches for identification of stable linear systems.

show abstract

mentioning

confidence: 99%

Kernel methods in system identification, machine learning and function estimation: A survey

et al. 2014

View full text Add to dashboard Cite

show abstract

“…The early stopping scheme suggests stopping of training at the point (epoch) from which onward the cost function value computed on cross-validation set starts to rise [46,[58][59][60]. Similarly, adding noise (jitters) into the training pattern improves FNN's generalization ability and removing insignificant weights from a trained FNN improves its fault tolerance ability [61].…”

Section: Generalizationmentioning

confidence: 99%

Metaheuristic design of feedforward neural networks: A review of two decades of research

Ojha

Abraham

Snel

2017

Engineering Applications of Artificial Intelligence

457

154

View full text Add to dashboard Cite

Over the past two decades, the feedforward neural network (FNN) optimization has been a key interest among the researchers and practitioners of multiple disciplines. The FNN optimization is often viewed from the various perspectives: the optimization of weights, network architecture, activation nodes, learning parameters, learning environment, etc. Researchers adopted such different viewpoints mainly to improve the FNN's generalization ability. The gradient-descent algorithm such as backpropagation has been widely applied to optimize the FNNs. Its success is evident from the FNN's application to numerous real-world problems. However, due to the limitations of the gradient-based optimization methods, the metaheuristic algorithms including the evolutionary algorithms, swarm intelligence, etc., are still being widely explored by the researchers aiming to obtain generalized FNN for a given problem. This article attempts to summarize a broad spectrum of FNN optimization methodologies including conventional and metaheuristic approaches. This article also tries to connect various research directions emerged out of the FNN optimization practices, such as evolving neural network (NN), cooperative coevolution NN, complex-valued NN, deep learning, extreme learning machine, quantum NN, etc. Additionally, it provides interesting research challenges for future research to cope-up with the present information processing era.

show abstract

“…We have trained 155 DBNs using the standard configuration used by others [3]: 784 input nodes, 500 nodes in the first hidden layer, 1000 nodes in the second hidden layer, and 10 target nodes. We have used the benchmark MNIST data set with 20, 000 training examples, 10, 000 held-out examples used for early stopping validation [39], and 10, 000 test examples. Figure 3 shows for the MNIST data, as done for the DNA promoter data, a comparison between the test-set accuracy of the DBNs (model accuracy) and the test-set accuracy of the extracted rules.…”

Section: Algorithm 4 Dbn Extractmentioning

confidence: 99%

Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks

Tran

Garcez

2018

IEEE Trans. Neural Netw. Learning Syst.

100

View full text Add to dashboard Cite

This is the accepted version of the paper.This version of the publication may differ from the final published version. Abstract-Developments in deep learning have seen the use of layer-wise unsupervised learning combined with supervised learning for fine-tuning. With this layer-wise approach, a deep network can be seen as a more modular system which lends itself well to learning representations. In this paper we investigate whether such modularity can be useful to the insertion of background knowledge into deep networks, and whether it can improve learning performance when it is available, and to the extraction of knowledge from trained deep networks, and whether it can offer a better understanding of the representations learned by such networks. To this end we use a simple symbolic language -a set of logical rules which we call confidence rulesand show that it is suitable for the representation of quantitative reasoning in deep networks. We show by knowledge extraction that confidence rules can offer a low-cost representation for layerwise networks (or restricted Boltzmann machines). We also show that layer-wise extraction can produce an improvement in the accuracy of Deep Belief Networks. Furthermore, the proposed symbolic characterisation of deep networks provides a novel method for the insertion of prior knowledge and training of deep networks. With the use of this method, a deep neuralsymbolic system is proposed and evaluated, with experimental results indicating that modularity through the use of confidence rules and knowledge insertion can be beneficial to network performance. Permanent repository link

show abstract

On Early Stopping in Gradient Descent Learning

Cited by 794 publications

References 46 publications

Kernel methods in system identification, machine learning and function estimation: A survey

Kernel methods in system identification, machine learning and function estimation: A survey

Metaheuristic design of feedforward neural networks: A review of two decades of research

Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks

Contact Info

Product

Resources

About