On-Line Learning in Neural Networks 1999
DOI: 10.1017/cbo9780511569920.003
|View full text |Cite
|
Sign up to set email alerts
|

On-line Learning and Stochastic Approximations

Abstract: The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples. The stochastic approximation theory then provides general results describing the convergence of all these learning algorithms at once.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
863
0
6

Year Published

2005
2005
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 707 publications
(870 citation statements)
references
References 25 publications
1
863
0
6
Order By: Relevance
“…Stochastic gradient descent: The Pegasos algorithm is an application of a stochastic sub-gradient method (see for example [25,34]). In the context of machine learning problems, the efficiency of the stochastic gradient approach has been studied in [26,1,3,27,6,5]. In particular, it has been claimed and experimentally observed that, "Stochastic algorithms yield the best generalization performance despite being the worst optimization algorithms".…”
Section: Introductionmentioning
confidence: 99%
“…Stochastic gradient descent: The Pegasos algorithm is an application of a stochastic sub-gradient method (see for example [25,34]). In the context of machine learning problems, the efficiency of the stochastic gradient approach has been studied in [26,1,3,27,6,5]. In particular, it has been claimed and experimentally observed that, "Stochastic algorithms yield the best generalization performance despite being the worst optimization algorithms".…”
Section: Introductionmentioning
confidence: 99%
“…where B is a batch sampled from X and B is the batch size, η is the learning rate and t is the iteration index [2]. These methods can be interpreted as gradient descent using noisy gradients, which are often referred to as mini-batch gradients with the specified batch size.…”
Section: Introductionmentioning
confidence: 99%
“…It can be shown (see e.g. [18]) that SGD converges almost surely towards the optimum, if the learning rates fulfill…”
Section: Stochastic Gradient Descentmentioning
confidence: 99%