2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) 2011
DOI: 10.1109/adprl.2011.5967353
|View full text |Cite
|
Sign up to set email alerts
|

Approximate reinforcement learning: An overview

Abstract: Abstract-Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approxim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(27 citation statements)
references
References 54 publications
0
27
0
Order By: Relevance
“…Besides policy gradient methods, value function based algorithms have also been studied extensively for reinforcement learning in continuous spaces [10], [43]. For example, an interesting Continuous-Action Q-Learning algorithm has been proposed in [43].…”
Section: Related Workmentioning
confidence: 99%
“…Besides policy gradient methods, value function based algorithms have also been studied extensively for reinforcement learning in continuous spaces [10], [43]. For example, an interesting Continuous-Action Q-Learning algorithm has been proposed in [43].…”
Section: Related Workmentioning
confidence: 99%
“…The optimal Q-function can be found using Policy Iteration or Value Iteration in a model-free manner, using, e.g., NNs as FAs. The optimal Q-function estimate and the optimal controller estimate can be updated from the transition samples in several ways: in online/offline mode, batch mode, or sample-by-sample update [23,46]. A particular class of online RL approaches is represented by the temporal difference-based AAC design that differs from the batch PI and VI approaches, as it avoids alternate batch back-up of the Q-function FA and of the controller FA.…”
Section: Adaptive Actor-critic Learning For Orm Tracking Controlmentioning
confidence: 99%
“…Similarly a quadratic, state-dependent reward generates linear quadratic regulator-type optimal responses [11]. An implicit assumption in these results is the ability of the RL algorithm to efficiently estimate the value function for both optimal and non-optimal control policies, although few results exist about the parametric form of the true value function [12].…”
Section: Literature Review and Backgroundmentioning
confidence: 99%
“…where ϕ(x k ) is the basis function vector and w k is the corresponding parameter vector. Parametric approximation schemes such as state aggregation, tile coding and normalized Gaussian radial basis function (RBF) are widely used in the RL literature as the theoretical analysis is simplified and the rate of parameter convergence is often faster [12][13][14]. Tile coding is simple and computationally efficient, and even though tile coding is a discrete representation of the (continuous) state space, its generalization capacity is reported to be preferable to simple look-up tables.…”
Section: Value Function Approximationmentioning
confidence: 99%