2008
DOI: 10.1007/978-3-540-89722-4_9
|View full text |Cite
|
Sign up to set email alerts
|

Basis Expansion in Natural Actor Critic Methods

Abstract: Abstract. In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated and its parameters are updated by steepest ascent in the direction of the gradient of the expected return with respect to the policy parameters. In general, the policy is defined in terms of a set of basis functions… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2008
2008
2013
2013

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Girgin and Preux [61] improve the performance of natural actor-critic algorithms, by using a neural network for the actor, which includes a mechanism to automatically add hidden layers to the neural network if the accuracy is not sufficient. Enhancing the eNAC method in [16] with this basis expansion method clearly showed its benefits on a cart-pole simulation.…”
Section: ) Discounted Return Settingmentioning
confidence: 99%
“…Girgin and Preux [61] improve the performance of natural actor-critic algorithms, by using a neural network for the actor, which includes a mechanism to automatically add hidden layers to the neural network if the accuracy is not sufficient. Enhancing the eNAC method in [16] with this basis expansion method clearly showed its benefits on a cart-pole simulation.…”
Section: ) Discounted Return Settingmentioning
confidence: 99%
“…Among on-going work, the way LSPI and the basis function construction process are intertwined needs more work. Although, our focus was on LSPI algorithm in this paper, the approach is neither restricted to LSPI, nor value-based reinforcement learning; [3] demonstrates that the same kind of approach may be embedded in natural actor-critics. In particular, Sigma-Point Policy Iteration (SPPI) and fitted Q-learning may be considered, SPPI being closely related to LSPI, and fitted Q-learning having demonstrated excellent performance and having nice theoretical properties.…”
Section: Resultsmentioning
confidence: 99%