2013
DOI: 10.1109/tnnls.2013.2247418
|View full text |Cite
|
Sign up to set email alerts
|

Algorithmic Survey of Parametric Value Function Approximation

Abstract: Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists of learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. A recurrent subtopic of RL concerns computing an approximation of this value function when the system is too large for an exact representation. This survey reviews state-of-the-art methods for (parametric) value function approximation by groupi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 70 publications
(51 citation statements)
references
References 68 publications
0
51
0
Order By: Relevance
“…The Markov Decision Process (MDP) provides a mathematical tool to model the dynamic system [3,4,17]. It is defined as a 5-tuple {S, A, P, R, γ}, where S is the state space and A is the (finite) action space.…”
Section: Preliminariesmentioning
confidence: 99%
“…The Markov Decision Process (MDP) provides a mathematical tool to model the dynamic system [3,4,17]. It is defined as a 5-tuple {S, A, P, R, γ}, where S is the state space and A is the (finite) action space.…”
Section: Preliminariesmentioning
confidence: 99%
“…the environment) that RL interacts with is generally modeled as a Markov Decision Process (MDP) [8]. An MDP is a tuple {S, A, P, R, γ} [9], [10], [11], where S is (finite) state space and A is (finite) action space. The state transition probability P : S ×A×S → [0, 1], from state s to the next state s ′ when taking action a, is given by P (s, a, s ′ ).…”
Section: Markov Decision Process (Mdp) and Actor-critic Reinforcementioning
confidence: 99%
“…In LSPI, the action-state value function Q7r is approximated by a linear parametric architecture with free parameters Wi h Q 7r i(x,a) = L1> j(x,a) w j = � (x,a) wi' (11) j=l where i(x, a) E JR h denotes the vector of basis functions or features, and Wi = [W1' W 2 , ... , W h]T denotes the weight vector. The parameter vector Wi can be adjusted appropriately so that the approximate value function is close enough to the exact one.…”
Section: Problem Statementmentioning
confidence: 99%