Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

Bethke, Brett; How, Jonathan P.

doi:10.1109/acc.2009.5160344

Cited by 11 publications

(9 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The core of Theorem 3 is that the optimization objectives are different on the left-hand side and right-hand side of (28). Theorem 3 indicates that it will get a better generalization ability when optimizing each value function (to obtain a global value function), respectively, for each local subspace of the state space.…”

Section: Assumptionmentioning

confidence: 99%

“…replacing-kernel reinforcement learning (RKRL) is an online model selection method for GPTD using a sequential Monte-Carlo method [26]. An approach as Bellman residual elimination (BRE) rather than Bellman residual minimization [27] is introduced to KBRL, which emphasizes the fact that the Bellman error is explicitly forced to zero, and BRE(GP) is proposed based on Gaussian process regression [28]. A unifying view of the different approaches is proposed to kernelized value function approximation for RL and demonstrates that several model-free kernelized value function approximators can be viewed as special cases of a novel, model-based value function approximator [29].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Online Selective Kernel-Based Temporal Difference Learning

Chen

Wang

2013

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Abstract-In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernelbased value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the upto-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.Index Terms-Function approximation, online sparsification, reinforcement learning (RL), selective ensemble learning, selective kernel-based value function.

show abstract

Section: Assumptionmentioning

confidence: 99%

mentioning

confidence: 99%

Online Selective Kernel-Based Temporal Difference Learning

Chen

Wang

2013

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

show abstract

“…The contribution of this paper is a novel approach to estimate the ROA of nonlinear systems in a flexible and parallelizable way exploiting Gaussian process (GP) regression to learn the infinite horizon cost function, which can be used as Lyapunov function for stable systems [11]. The infinite horizon cost is learned efficiently with a Gaussian process by exploiting the Bellman equation [12]. Since the learned cost might violate the Lyapunov conditions around the origin due to regression errors, we derive a theorem allowing to extend known regions of attraction through a Lyapunov-like function.…”

Section: Introductionmentioning

confidence: 99%

Local Asymptotic Stability Analysis and Region of Attraction Estimation with Gaussian Processes

Lederer

Hirche

2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Determining the region of attraction of nonlinear systems is a difficult problem, which is typically approached by means of Lyapunov theory. State of the art approaches either provide high flexibility regarding the Lyapunov function or parallelizability of computation. Aiming at both, flexibility and parallelizability, we propose a method to obtain a Lyapunov-like function for stability analysis by learning the infinite horizon cost function with a Gaussian process based on approximate dynamic programming. We develop a novel approach to characterize the region of attraction using a Lyapunov-like function, which is analyzed with a sampling-based interval analysis algorithm. Since each interval can be examined independently, the algorithm allows both parallelizable analysis and flexible construction of the Lyapunov-like function.R + /N + all real/integer positive numbers, In the n × n identity matrix, · the Euclidean norm and < ·, · > the scalar product.

show abstract

“…This work has led to algorithms such as GPTD [8], an approach which uses temporal differences to learn a Gaussian process representation of the cost-to-go function, and GPDP [9], which is an approximate value iteration scheme based on a similar Gaussian process cost-to-go representation. Another recently-developed approach, known as Bellman Residual Elimination (BRE) [1], [2], uses kernelbased regression to solve a system of Bellman equations over a small set of sample states.…”

Section: Introductionmentioning

confidence: 99%

Approximate dynamic programming using model-free Bellman Residual Elimination

Bethke

How

2010

Proceedings of the 2010 American Control Conference

Self Cite

View full text Add to dashboard Cite

This paper presents an modification to the method of Bellman Residual Elimination (BRE) [1], [2] for approximate dynamic programming. While prior work on BRE has focused on learning an approximate policy for an underlying Markov Decision Process (MDP) when the state transition model of the MDP is known, this work proposes a model-free variant of BRE that does not require knowledge of the state transition model. Instead, state trajectories of the system, generated using simulation and/or observations of the real system in operation, are used to build stochastic approximations of the quantities needed to carry out the BRE algorithm. The resulting algorithm can be shown to converge to the policy produced by the nominal, model-based BRE algorithm in the limit of observing an infinite number of trajectories. To validate the performance of the approach, we compare model-based and model-free BRE against LSPI [3], a well-known approximate dynamic programming algorithm. Measuring performance in terms of both computational complexity and policy quality, we present results showing that BRE performs at least as well as, and sometimes significantly better than, LSPI on a standard benchmark problem.

show abstract

Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

Cited by 11 publications

References 62 publications

Online Selective Kernel-Based Temporal Difference Learning

Online Selective Kernel-Based Temporal Difference Learning

Local Asymptotic Stability Analysis and Region of Attraction Estimation with Gaussian Processes

Approximate dynamic programming using model-free Bellman Residual Elimination

Contact Info

Product

Resources

About