Approximate reinforcement learning: An overview

Buşoniu, Lucian; Ernst, Damien; Schutter, Bart De; Babuška, Robert

doi:10.1109/adprl.2011.5967353

Cited by 47 publications

(27 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides policy gradient methods, value function based algorithms have also been studied extensively for reinforcement learning in continuous spaces [10], [43]. For example, an interesting Continuous-Action Q-Learning algorithm has been proposed in [43].…”

Section: Related Workmentioning

confidence: 99%

Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

Chen

Douch

Zhang

2016

IEEE Trans. Evol. Computat.

View full text Add to dashboard Cite

Despite of the proven effectiveness, many Michigan learning classifier systems cannot perform multi-step reinforcement learning in continuous spaces. To meet this technical challenge, some learning classifier systems have been designed to learn fuzzy logic rules. They can be largely classified into strength-based and accuracy-based systems. The latter is gaining more research attention in the last decade. However existing accuracy-based learning systems either address primarily singlestep learning problems or require the action space to be discrete. In this paper, a new accuracy-based learning fuzzy classifier system is developed to explicitly handle continuous state input and continuous action output during multi-step reinforcement learning. Several technical improvements have been achieved while developing the new learning algorithm. Particularly, we have successfully extended Q-learning like credit assignment methods to continuous spaces. To enable direct learning of stochastic strategies for action selection, we have also proposed to use a new fuzzy logic system with stochastic action outputs. Moreover, fine-grained learning of fuzzy rules has been achieved effectively in our algorithm by using a natural gradient learning method. It is the first time for these techniques to be utilized substantially in any accuracy-based learning fuzzy classifier systems. Meanwhile, in comparison with several recently proposed learning algorithms, our algorithm is shown to perform highly competitively on four benchmark learning problems and a robotics problem. The practical usefulness of our algorithm is also demonstrated by improving the performance of a wireless body area network.

show abstract

Section: Related Workmentioning

confidence: 99%

Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

Chen

Douch

Zhang

2016

IEEE Trans. Evol. Computat.

View full text Add to dashboard Cite

show abstract

“…The optimal Q-function can be found using Policy Iteration or Value Iteration in a model-free manner, using, e.g., NNs as FAs. The optimal Q-function estimate and the optimal controller estimate can be updated from the transition samples in several ways: in online/offline mode, batch mode, or sample-by-sample update [23,46]. A particular class of online RL approaches is represented by the temporal difference-based AAC design that differs from the batch PI and VI approaches, as it avoids alternate batch back-up of the Q-function FA and of the controller FA.…”

Section: Adaptive Actor-critic Learning For Orm Tracking Controlmentioning

confidence: 99%

Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic

Rădac

Precup

2019

Applied Sciences

View full text Add to dashboard Cite

This paper proposes a neural network (NN)-based control scheme in an Adaptive Actor-Critic (AAC) learning framework designed for output reference model tracking, as a representative deep-learning application. The control learning scheme is model-free with respect to the process model. AAC designs usually require an initial controller to start the learning process; however, systematic guidelines for choosing the initial controller are not offered in the literature, especially in a model-free manner. Virtual Reference Feedback Tuning (VRFT) is proposed for obtaining an initially stabilizing NN nonlinear state-feedback controller, designed from input-state-output data collected from the process in open-loop setting. The solution offers systematic design guidelines for initial controller design. The resulting suboptimal state-feedback controller is next improved under the AAC learning framework by online adaptation of a critic NN and a controller NN. The mixed VRFT-AAC approach is validated on a multi-input multi-output nonlinear constrained coupled vertical two-tank system. Discussions on the control system behavior are offered together with comparisons with similar approaches.

show abstract

“…Similarly a quadratic, state-dependent reward generates linear quadratic regulator-type optimal responses [11]. An implicit assumption in these results is the ability of the RL algorithm to efficiently estimate the value function for both optimal and non-optimal control policies, although few results exist about the parametric form of the true value function [12].…”

Section: Literature Review and Backgroundmentioning

confidence: 99%

“…where ϕ(x k ) is the basis function vector and w k is the corresponding parameter vector. Parametric approximation schemes such as state aggregation, tile coding and normalized Gaussian radial basis function (RBF) are widely used in the RL literature as the theoretical analysis is simplified and the rate of parameter convergence is often faster [12][13][14]. Tile coding is simple and computationally efficient, and even though tile coding is a discrete representation of the (continuous) state space, its generalization capacity is reported to be preferable to simple look-up tables.…”

Section: Value Function Approximationmentioning

confidence: 99%

Chaotic dynamics and convergence analysis of temporal difference algorithms with bang‐bang control

Tutsoy

Brown

2015

Optim Control Appl Methods

View full text Add to dashboard Cite

Reinforcement learning is a powerful tool used to obtain optimal control solutions for complex and difficult sequential decision making problems where only a minimal amount of a priori knowledge exists about the system dynamics. As such, it has also been used as a model of cognitive learning in humans and applied to systems, such as humanoid robots, to study embodied cognition. In this paper, a different approach is taken where a simple test problem is used to investigate issues associated with the value function's representation and parametric convergence. In particular, the terminal convergence problem is analyzed with a known optimal control policy where the aim is to accurately learn the value function. For certain initial conditions, the value function is explicitly calculated and it is shown to have a polynomial form. It is parameterized by terms that are functions of the unknown plant's parameters and the value function's discount factor, and their convergence properties are analyzed. It is shown that the temporal difference error introduces a null space associated with the finite horizon basis function during the experiment. The learning problem is only non-singular when the experiment termination is handled correctly and a number of (equivalent) solutions are described. Finally, it is demonstrated that, in general, the test problem's dynamics are chaotic for random initial states and this causes digital offset in the value function learning. The offset is calculated, and a dead zone is defined to switch off learning in the chaotic region.

show abstract

Approximate reinforcement learning: An overview

Cited by 47 publications

References 54 publications

Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

Accuracy-Based Learning Classifier Systems for Multistep Reinforcement Learning: A Fuzzy Logic Approach to Handling Continuous Inputs and Learning Continuous Actions

Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic

Chaotic dynamics and convergence analysis of temporal difference algorithms with bang‐bang control

Contact Info

Product

Resources

About