2000
DOI: 10.1137/s0363012997331639
|View full text |Cite
|
Sign up to set email alerts
|

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

Abstract: It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

6
445
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 434 publications
(451 citation statements)
references
References 18 publications
6
445
0
Order By: Relevance
“…This result was subsequently pursued for the average-reward algorithms (Abounadi et al, 2001) which also exploited results for non-expansive mappings (Borkar and Soumyanath, 1997). A more general ODE result that can be used for both discounted and average reward cases was proposed later (Borkar and Meyn, 2000); this result employs notions of fluid limits (Dai and Meyn, 1995). Most of these results (see however Abounadi et al (2002)) require showing apriori boundedness of the iterate, which is possible under some conditions (Bertsekas and Tsitsiklis, 1996;Borkar and Meyn, 2000;Gosavi, 2006), and the existence of some asymptotic properties of the ODE.…”
Section: Semi-markov Decision Problemsmentioning
confidence: 99%
See 1 more Smart Citation
“…This result was subsequently pursued for the average-reward algorithms (Abounadi et al, 2001) which also exploited results for non-expansive mappings (Borkar and Soumyanath, 1997). A more general ODE result that can be used for both discounted and average reward cases was proposed later (Borkar and Meyn, 2000); this result employs notions of fluid limits (Dai and Meyn, 1995). Most of these results (see however Abounadi et al (2002)) require showing apriori boundedness of the iterate, which is possible under some conditions (Bertsekas and Tsitsiklis, 1996;Borkar and Meyn, 2000;Gosavi, 2006), and the existence of some asymptotic properties of the ODE.…”
Section: Semi-markov Decision Problemsmentioning
confidence: 99%
“…A more general ODE result that can be used for both discounted and average reward cases was proposed later (Borkar and Meyn, 2000); this result employs notions of fluid limits (Dai and Meyn, 1995). Most of these results (see however Abounadi et al (2002)) require showing apriori boundedness of the iterate, which is possible under some conditions (Bertsekas and Tsitsiklis, 1996;Borkar and Meyn, 2000;Gosavi, 2006), and the existence of some asymptotic properties of the ODE. Once this is accomplished, a critical lemma from Hirsch (1989) is employed to prove convergence w.p.1.…”
Section: Semi-markov Decision Problemsmentioning
confidence: 99%
“…In fact, in all the cases when we apply this result, A will be negative definite. The proof of convergence here follows in a straightforward manner from the results in [1], [2].…”
Section: A Convergence Analysismentioning
confidence: 99%
“…We verify Assumptions (A1) and (A2) of [2]. Let G t = σ(u s , θ s , s ≤ t; φ s , s < t), t ≥ 0 be an associated sequence of sigma fields.…”
Section: A Convergence Analysismentioning
confidence: 99%
See 1 more Smart Citation