Hybrid least-squares algorithms for approximate policy evaluation

Johns, Jeff; Petrik, Marek; Mahadevan, Sridhar

doi:10.1007/s10994-009-5128-4

Cited by 17 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that this is a control benchmark, rather than value approximation for a fixed policy. Since the goal of RL is to optimize a policy, results on policy optimization are often more meaningful than just obtaining a small Bellman residual which is not sufficient to guarantee that a good policy will be computed (Johns, Petrik, and Mahadevan 2009).…”

Section: Cart-polementioning

confidence: 99%

Fast Feature Selection for Linear Value Function Approximation

Behzadian

Gharatappeh

Petrik

2019

ICAPS

Self Cite

View full text Add to dashboard Cite

Linear value function approximation is a standard approach to solving reinforcement learning problems with large state spaces. Since designing good approximation features is difficult, automatic feature selection is an important research topic. We propose a new method for feature selection that is based on a low-rank factorization of the transition matrix. Our approach derives features directly from high-dimensional raw inputs, such as image data. The method is easy to implement using SVD, and our experiments show that it is faster and more stable than alternative methods.

show abstract

Section: Cart-polementioning

confidence: 99%

Fast Feature Selection for Linear Value Function Approximation

Behzadian

Gharatappeh

Petrik

2019

ICAPS

Self Cite

View full text Add to dashboard Cite

show abstract

“…Once again, K i is a gain and r i + γφ T i+1 θ i−1 − φ T i θ i−1 a temporal difference error, to be linked to the Widrow-Hoff update (7). LSTD can be slightly modified to have an improved computational cost [45] (assuming that the features are sparse, which is not necessarily the case), and it can be "mixed" with a residual approach [46].…”

Section: A Least-squares-based Approachesmentioning

confidence: 99%

Algorithmic Survey of Parametric Value Function Approximation

Geist

Pietquin

2013

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists of learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. A recurrent subtopic of RL concerns computing an approximation of this value function when the system is too large for an exact representation. This survey reviews state-of-the-art methods for (parametric) value function approximation by grouping them into three main categories: bootstrapping, residual, and projected fixed-point approaches. Related algorithms are derived by considering one of the associated cost functions and a specific minimization method, generally a stochastic gradient descent or a recursive least-squares approach.

show abstract

“…In "Hybrid least-squares algorithms for approximate policy evaluation" (Johns et al 2009) combine two key methods of approximate policy evaluation into a stronger alternative.…”

Section: Papers Appearing In the Journal Of Machine Learningmentioning

confidence: 99%