On-line Learning and Stochastic Approximations

Bottou, Léon

doi:10.1017/cbo9780511569920.003

Cited by 707 publications

(870 citation statements)

References 25 publications

Supporting

Mentioning

863

Contrasting

Unclassified

Order By: Relevance

“…Stochastic gradient descent: The Pegasos algorithm is an application of a stochastic sub-gradient method (see for example [25,34]). In the context of machine learning problems, the efficiency of the stochastic gradient approach has been studied in [26,1,3,27,6,5]. In particular, it has been claimed and experimentally observed that, "Stochastic algorithms yield the best generalization performance despite being the worst optimization algorithms".…”

Section: Introductionmentioning

confidence: 99%

Pegasos: primal estimated sub-gradient solver for SVM

et al. 2010

View full text Add to dashboard Cite

We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy isÕ(1/ ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ 2 ) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method isÕ(d/(λ )), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Pegasos: primal estimated sub-gradient solver for SVM

et al. 2010

View full text Add to dashboard Cite

show abstract

“…where B is a batch sampled from X and B is the batch size, η is the learning rate and t is the iteration index [2]. These methods can be interpreted as gradient descent using noisy gradients, which are often referred to as mini-batch gradients with the specified batch size.…”

Section: Introductionmentioning

confidence: 99%

Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets

Радюк

2017

Information Technology and Management Science

155

View full text Add to dashboard Cite

-A problem of improving the performance of convolutional neural networks is considered. A parameter of the training set is investigated. The parameter is the batch size. The goal is to find an impact of training set batch size on the performance. To get consistent results, diverse datasets are used. They are MNIST and CIFAR-10. Simplicity of the MNIST dataset stands against complexity of the CIFAR-10 dataset, although the simpler dataset has 10 classes as well as the more complicated one. To achieve acceptable testing results, various convolutional neural network architectures are selected for the MNIST and CIFAR-10 datasets, with two and five convolutional layers, respectively. The assumption about the dependence of the recognition accuracy on the batch size value is confirmed: the larger the batch size value, the higher the recognition accuracy. Another assumption about the impact of the type of the batch size value on the CNN performance is not confirmed.

show abstract

“…It can be shown (see e.g. [18]) that SGD converges almost surely towards the optimum, if the learning rates fulfill…”

Section: Stochastic Gradient Descentmentioning

confidence: 99%

A critical evaluation of stochastic algorithms for convex optimization

Wiesler

Richard

Schlüter

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Log-linear models find a wide range of applications in pattern recognition. The training of log-linear models is a convex optimization problem. In this work, we compare the performance of stochastic and batch optimization algorithms. Stochastic algorithms are fast on large data sets but can not be parallelized well. In our experiments on a broadcast conversations recognition task, stochastic methods yield competitive results after only a short training period, but when spending enough computational resources for parallelization, batch algorithms are competitive with stochastic algorithms. We obtained slight improvements by using a stochastic second order algorithm. Our best log-linear model outperforms the maximum likelihood trained Gaussian mixture model baseline although being ten times smaller.

show abstract

On-line Learning and Stochastic Approximations

Cited by 707 publications

References 25 publications

Pegasos: primal estimated sub-gradient solver for SVM

Pegasos: primal estimated sub-gradient solver for SVM

Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets

A critical evaluation of stochastic algorithms for convex optimization

Contact Info

Product

Resources

About