2016
DOI: 10.1214/15-aos1391
|View full text |Cite
|
Sign up to set email alerts
|

Nonparametric stochastic approximation with large step-sizes

Abstract: We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS H, even if the optimal predictor (i.e., the conditional expectation) is not in H. In a stochastic approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of stoc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

10
164
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 101 publications
(174 citation statements)
references
References 34 publications
10
164
0
Order By: Relevance
“…It is well known that the operator L K is also well-defined on V = H K , that it is trace-class positive semi-definite on H K , and that A 2 = L K (L 2 (dρ)) = L 1/2 K (H K ). Thus, our result recovers rates for the noiseless case analogous to those known in online learning with kernels for similar approximation algorithms [5,10,11], where the spaces defined in terms of the spectral decomposition of L K often serve as smoothness classes.…”
Section: Stochastic Approximation In Rhkssupporting
confidence: 77%
“…It is well known that the operator L K is also well-defined on V = H K , that it is trace-class positive semi-definite on H K , and that A 2 = L K (L 2 (dρ)) = L 1/2 K (H K ). Thus, our result recovers rates for the noiseless case analogous to those known in online learning with kernels for similar approximation algorithms [5,10,11], where the spaces defined in terms of the spectral decomposition of L K often serve as smoothness classes.…”
Section: Stochastic Approximation In Rhkssupporting
confidence: 77%
“…There are also several relevant works in the context of machine learning [27,26,17,7]. Ying and Pontil [27] studied an online least-squares gradient descent algorithm in a reproducing kernel Hilbert space (RKHS), and presented a novel capacity independent approach to derive bounds on the generalization error.…”
Section: Introductionmentioning
confidence: 99%
“…However, the rates derived in Theorem 1 still suffer a saturation problem, that is, when r > 1 − β 2 , the rates will not be improved. The same problem is also observed in [5], when establishing capacity dependent rates for the averaging scheme of algorithm (3), which we will briefly discuss in Section 3.…”
Section: Resultsmentioning
confidence: 60%
“…We would like to stress that our analysis established here is specific to the last iterate of algorithm (3). There has already been several literature on how to improve the convergence results by averaging schemes in online learning, e.g., see [5]. Taking average of the outputs in each iterate may generate more robust solutions [11], but it also slows down the training speed in the practical implementations [12].…”
Section: Introductionmentioning
confidence: 99%