2014
DOI: 10.1007/978-3-662-44851-9_5
|View full text |Cite
|
Sign up to set email alerts
|

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control

Abstract: We propose a stochastic approximation based method with randomisation of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm. Our method results in an O(d) improvement in complexity in comparison to regular LSTD, where d is the dimension of the data. We provide convergence rate results for our proposed method, both in high probability and in expectation. Moreover, we also establish that using our scheme in place of LSTD does not impact the rate of convergence of the appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
15
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(15 citation statements)
references
References 12 publications
0
15
0
Order By: Relevance
“…Analysis of related algorithms: A number of papers analyze algorithms related to and inspired by the classic TD algorithm. First, among others, Antos et al [2008], , , Pires and Szepesvári [2012], Prashanth et al [2013] and Tu and Recht [2018] analyze least-squares temporal difference learning (LSTD). Yu and Bertsekas [2009] study the related least-squares policy iteration algorithm.…”
Section: Asymptotic Analysis Of Stochastic Approximationmentioning
confidence: 99%
“…Analysis of related algorithms: A number of papers analyze algorithms related to and inspired by the classic TD algorithm. First, among others, Antos et al [2008], , , Pires and Szepesvári [2012], Prashanth et al [2013] and Tu and Recht [2018] analyze least-squares temporal difference learning (LSTD). Yu and Bertsekas [2009] study the related least-squares policy iteration algorithm.…”
Section: Asymptotic Analysis Of Stochastic Approximationmentioning
confidence: 99%
“…This paper is an extended version of an earlier work (see Prashanth et al 2014). This work corrects the errors in the earlier work by using significant deviations in the proofs, and includes additional simulation experiments.…”
mentioning
confidence: 84%
“…This work corrects the errors in the earlier work by using significant deviations in the proofs, and includes additional simulation experiments. Finally, by Narayanan and Szepesvári (2017), the authors list a few problems with the results and proofs in the conference version (Prashanth et al 2014), and the corrections incorporated in this work address the comments by Narayanan and Szepesvári (2017).…”
mentioning
confidence: 99%
“…The analysis is a bit different than the one in Lazaric et al [2010b] and the bound is weaker in terms of d and ν. Another recent result is by Prashanth et al [2014] that use stochastic approximation to solve LSTD(0), where the resulting algorithm is exactly TD(0) with random sampling (samples are drawn i.i.d. and not from a trajectory), and report a Markov design bound (the bound is computed only at the states used by the algorithm) of O( d nν ) for LSTD(0).…”
Section: Related Workmentioning
confidence: 99%