Basis Expansion in Natural Actor Critic Methods

Girgin, Sertan; Preux, Philippe

doi:10.1007/978-3-540-89722-4_9

Cited by 6 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Girgin and Preux [61] improve the performance of natural actor-critic algorithms, by using a neural network for the actor, which includes a mechanism to automatically add hidden layers to the neural network if the accuracy is not sufficient. Enhancing the eNAC method in [16] with this basis expansion method clearly showed its benefits on a cart-pole simulation.…”

Section: ) Discounted Return Settingmentioning

confidence: 99%

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Grondman

Buşoniu

Lopes

et al. 2012

IEEE Trans. Syst., Man, Cybern. C

841

374

View full text Add to dashboard Cite

Policy gradient based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper therefore describes the state of the art of actorcritic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms in the past few years. A review of several standard and natural actor-critic algorithms follows and the paper concludes with an overview of application areas and a discussion on open issues.

show abstract

Section: ) Discounted Return Settingmentioning

confidence: 99%

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Grondman

Buşoniu

Lopes

et al. 2012

IEEE Trans. Syst., Man, Cybern. C

841

374

View full text Add to dashboard Cite

show abstract

“…Among on-going work, the way LSPI and the basis function construction process are intertwined needs more work. Although, our focus was on LSPI algorithm in this paper, the approach is neither restricted to LSPI, nor value-based reinforcement learning; [3] demonstrates that the same kind of approach may be embedded in natural actor-critics. In particular, Sigma-Point Policy Iteration (SPPI) and fitted Q-learning may be considered, SPPI being closely related to LSPI, and fitted Q-learning having demonstrated excellent performance and having nice theoretical properties.…”

Section: Resultsmentioning

confidence: 99%

Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture

Girgin

Preux

2008

2008 Seventh International Conference on Machine Learning and Applications

Self Cite

View full text Add to dashboard Cite

In reinforcement learning, it is a common practice to map the state(-action) space to a different one using basis functions. This transformation aims to represent the input data in a more informative form that facilitates and improves subsequent steps. As a "good" set of basis functions result in better solutions and defining such functions becomes a challenge with increasing problem complexity, it is beneficial to be able to generate them automatically. In this paper, we propose a new approach based on Bellman residual for constructing basis functions using cascadecorrelation learning architecture. We show how this approach can be applied to Least Squares Policy Iteration algorithm in order to obtain a better approximation of the value function, and consequently improve the performance of the resulting policies. We also present the effectiveness of the method empirically on some benchmark problems.

show abstract

Adaptive Bases for Reinforcement Learning

Castro

Mannor

2010

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actorcritic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.

show abstract

Basis Expansion in Natural Actor Critic Methods

Cited by 6 publications

References 15 publications

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture

Adaptive Bases for Reinforcement Learning

Contact Info

Product

Resources

About