2009
DOI: 10.1007/s10994-009-5128-4
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid least-squares algorithms for approximate policy evaluation

Abstract: The goal of approximate policy evaluation is to "best" represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Note that this is a control benchmark, rather than value approximation for a fixed policy. Since the goal of RL is to optimize a policy, results on policy optimization are often more meaningful than just obtaining a small Bellman residual which is not sufficient to guarantee that a good policy will be computed (Johns, Petrik, and Mahadevan 2009).…”
Section: Cart-polementioning
confidence: 99%
“…Note that this is a control benchmark, rather than value approximation for a fixed policy. Since the goal of RL is to optimize a policy, results on policy optimization are often more meaningful than just obtaining a small Bellman residual which is not sufficient to guarantee that a good policy will be computed (Johns, Petrik, and Mahadevan 2009).…”
Section: Cart-polementioning
confidence: 99%
“…Once again, K i is a gain and r i + γφ T i+1 θ i−1 − φ T i θ i−1 a temporal difference error, to be linked to the Widrow-Hoff update (7). LSTD can be slightly modified to have an improved computational cost [45] (assuming that the features are sparse, which is not necessarily the case), and it can be "mixed" with a residual approach [46].…”
Section: A Least-squares-based Approachesmentioning
confidence: 99%
“…In "Hybrid least-squares algorithms for approximate policy evaluation" (Johns et al 2009) combine two key methods of approximate policy evaluation into a stronger alternative.…”
Section: Papers Appearing In the Journal Of Machine Learningmentioning
confidence: 99%