Geodesic Gaussian kernels for value function approximation

Sugiyama, Masashi; Hachiya, Hirotaka; Towell, Christopher; Vijayakumar, Sethu

doi:10.1007/s10514-008-9095-6

Cited by 27 publications

(18 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, accurately approximating the value function is a challenge in the value function based approach. So far, various machine learning techniques have been employed for better value function approximation, such as least-squares approximation [12], manifold learning [17], efficient sample reuse [6], active learning [2], and robust learning [16].…”

Section: Policy Iteration Vs Policy Searchmentioning

confidence: 99%

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation

et al. 2014

Self Cite

View full text Add to dashboard Cite

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the practical usefulness of the proposed method.

show abstract

Section: Policy Iteration Vs Policy Searchmentioning

confidence: 99%

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation

et al. 2014

Self Cite

View full text Add to dashboard Cite

show abstract

“…Tobias and Daniel proposed a LSTD approach based on SVMs [96]. Several researchers have investigated designing specialized kernels that exploit manifold structure in the state space [90,91,59,60,10,9,87]. This work represents exciting progress; however, the field of kernel-based ADP has developed only recently, and there remain numerous possibilities that are yet unexplored.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Investigation of manifold-based kernels, and their relationship to n-stage BRE A number of researchers have proposed using kernels that exploit manifold structure on the state space as a means of devising cost approximation algorithms [90,91,59,60,10,9,87]. We believe that these kernels are particularly appropriate for use in our BRE algorithms, and propose to test these manifold-based kernels in several BRE test problems.…”

Section: Further Bre Algorithm Development/extensionmentioning

confidence: 99%

Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

Bethke

How

2009

2009 American Control Conference

View full text Add to dashboard Cite

The overarching goal of the thesis is to devise new strategies for multi-agent planning and control problems, especially in the case where the agents are subject to random failures, maintenance needs, or other health management concerns, or in cases where the system model is not perfectly known. We argue that dynamic programming techniques, in particular Markov Decision Processes (MDPs), are a natural framework for addressing these planning problems, and present an MDP problem formulation for a persistent surveillance mission that incorporates stochastic fuel usage dynamics and the possibility for randomly-occurring failures into the planning process. We show that this problem formulation and its optimal policy lead to good mission performance in a number of realworld scenarios. Furthermore, an on-line, adaptive solution framework is developed that allows the planning system to improve its performance over time, even in the case where the true system model is uncertain or time-varying. Motivated by the difficulty of solving the persistent mission problem exactly when the number of agents becomes large, we then develop a new family of approximate dynamic programming algorithms, called Bellman Residual Elimination (BRE) methods, which can be employed to approximately solve large-scale MDPs. We analyze these methods and prove a number of desirable theoretical properties about them, including reduction to exact policy iteration under certain conditions. Finally, we apply these BRE methods to large-scale persistent surveillance problems and show that they yield good performance, and furthermore, that they can be successfully integrated into the adaptive planning framework.2

show abstract

“…Recently, more sophisticated methods of constructing suitable basis functions have been proposed, which effectively make use of the graph structure induced by MDPs [5]. In this section, we introduce a novel way of constructing basis functions by incorporating the graph structure; while relation to the existing graph-based methods is discussed in the separate report [14].…”

Section: Gaussian Kernels On Graphsmentioning

confidence: 99%

Value Function Approximation on Non-Linear Manifolds for Robot Motor Control

Sugiyama

Hachiya

Towell

et al. 2007

Proceedings 2007 IEEE International Conference on Robotics and Automation

Self Cite

View full text Add to dashboard Cite

Abstract-The least squares approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in realworld reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in a simulated robot arm control and Khepera robot navigation.

show abstract

Geodesic Gaussian kernels for value function approximation

Cited by 27 publications

References 17 publications

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation

Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation

Approximate dynamic programming using Bellman residual elimination and Gaussian process regression

Value Function Approximation on Non-Linear Manifolds for Robot Motor Control

Contact Info

Product

Resources

About