The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

Borkar, Vivek S.; Meyn, Sean

doi:10.1137/s0363012997331639

Cited by 434 publications

(451 citation statements)

References 18 publications

Supporting

Mentioning

445

Contrasting

Order By: Relevance

“…This result was subsequently pursued for the average-reward algorithms (Abounadi et al, 2001) which also exploited results for non-expansive mappings (Borkar and Soumyanath, 1997). A more general ODE result that can be used for both discounted and average reward cases was proposed later (Borkar and Meyn, 2000); this result employs notions of fluid limits (Dai and Meyn, 1995). Most of these results (see however Abounadi et al (2002)) require showing apriori boundedness of the iterate, which is possible under some conditions (Bertsekas and Tsitsiklis, 1996;Borkar and Meyn, 2000;Gosavi, 2006), and the existence of some asymptotic properties of the ODE.…”

Section: Semi-markov Decision Problemsmentioning

confidence: 99%

“…A more general ODE result that can be used for both discounted and average reward cases was proposed later (Borkar and Meyn, 2000); this result employs notions of fluid limits (Dai and Meyn, 1995). Most of these results (see however Abounadi et al (2002)) require showing apriori boundedness of the iterate, which is possible under some conditions (Bertsekas and Tsitsiklis, 1996;Borkar and Meyn, 2000;Gosavi, 2006), and the existence of some asymptotic properties of the ODE. Once this is accomplished, a critical lemma from Hirsch (1989) is employed to prove convergence w.p.1.…”

Section: Semi-markov Decision Problemsmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning: A Tutorial Survey and Recent Advances

Gosavi

2009

INFORMS Journal on Computing

258

View full text Add to dashboard Cite

In the last few years, Reinforcement Learning (RL), also called adaptive (or approximate) dynamic programming (ADP), has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently, it has attracted the attention of optimization theorists because of several noteworthy success stories from operations management. It is on large-scale and complex problems of dynamic optimization, in particular the Markov decision problem (MDP) and its variants, that the power of RL becomes more obvious. It has been known for many years that on large-scale MDPs, the curse of dimensionality and the curse of modeling render classical dynamic programming (DP) ineffective. The excitement in RL stems from its direct attack on these curses, allowing it to solve problems that were considered intractable, via classical DP, in the past. The success of RL is due to its strong mathematical roots in the principles of DP, Monte Carlo simulation, function approximation, and AI. Topics treated in some detail in this survey are: Temporal differences, Q-Learning, semi-MDPs and stochastic games. Several recent advances in RL, e.g., policy gradients and hierarchical RL, are covered along with references. Pointers to numerous examples of applications are provided. This overview is aimed at uncovering the mathematical roots of this science, so that readers gain a clear understanding of the core concepts and are able to use them in their own research. The survey points to more than 100 references from the literature.

show abstract

Section: Semi-markov Decision Problemsmentioning

confidence: 99%

Section: Semi-markov Decision Problemsmentioning

confidence: 99%

Reinforcement Learning: A Tutorial Survey and Recent Advances

Gosavi

2009

INFORMS Journal on Computing

258

View full text Add to dashboard Cite

show abstract

“…In fact, in all the cases when we apply this result, A will be negative definite. The proof of convergence here follows in a straightforward manner from the results in [1], [2].…”

Section: A Convergence Analysismentioning

confidence: 99%

“…We verify Assumptions (A1) and (A2) of [2]. Let G t = σ(u s , θ s , s ≤ t; φ s , s < t), t ≥ 0 be an associated sequence of sigma fields.…”

Section: A Convergence Analysismentioning

confidence: 99%

See 1 more Smart Citation

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS

Yao

Bhatnagar

Szepesvári

2009

Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly With 2009 28th Chinese Control Conference

View full text Add to dashboard Cite

Abstract-We consider linear prediction problems in a stochastic environment. The least mean square (LMS) algorithm is a well-known, easy to implement and computationally cheap solution to this problem. However, as it is well known, the LMS algorithm, being a stochastic gradient descent rule, may converge slowly. The recursive least squares (RLS) algorithm overcomes this problem, but its computational cost is quadratic in the problem dimension. In this paper we propose a two timescale stochastic approximation algorithm which, as far as its slower timescale is considered, behaves the same way as the RLS algorithm, while it is as cheap as the LMS algorithm. In addition, the algorithm is easy to implement. The algorithm is shown to give estimates that converge to the best possible estimate with probability one. The performance of the algorithm is tested in two examples and it is found that it may indeed offer some performance gain over the LMS algorithm.

show abstract

Simultaneous Perturbation and Finite Difference Methods

Bhatnagar

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

This article presents a survey of simultaneous perturbation and finite difference algorithms. One of the most important in this category of algorithms is the simultaneous perturbation stochastic approximation (SPSA) algorithm that has been used in a wide range of settings. The article presents an overview of the theory behind SPSA as well as the recent development of other finite‐difference algorithms along the lines of SPSA. The article also gives an overview of the applications of SPSA and SPSA‐type algorithms in various disciplines of engineering and science.

show abstract

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

Cited by 434 publications

References 18 publications

Reinforcement Learning: A Tutorial Survey and Recent Advances

Reinforcement Learning: A Tutorial Survey and Recent Advances

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS

Simultaneous Perturbation and Finite Difference Methods

Contact Info

Product

Resources

About