Reinforcement learning with Gaussian processes

Engel, Yaakov; Mannor, Shie; Meir, Ron

doi:10.1145/1102351.1102377

Cited by 260 publications

(268 citation statements)

References 5 publications

Supporting

Mentioning

268

Contrasting

Order By: Relevance

“…Engel et al [2] approached the problem from the viewpoint of temporal difference learning (GPTD) and later extended this scheme to be able to deal with stochastic state transitions to improve action selection and to learning Q-values without an explicit transition model (GPSARSA) [3]. Their approach was successfully applied to the problem of learning complex manipulation policies [4].…”

Section: Related Workmentioning

confidence: 99%

“…This, however, can lead to discretization errors or, when finegrained grids are used, requires a huge amount of memory and a time-consuming exploration process. Therefore, function approximation techniques that directly operate on the continuous space such as neural networks [1], [15], kernel methods [9], or Gaussian processes [13], [3] have been proposed as powerful alternatives to the discrete approximations of the continuous Q-function. From a regression perspective, these techniques seek to model the dependency…”

Section: B Learning the Q-functionmentioning

confidence: 99%

See 1 more Smart Citation

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

Rottmann

Plagemann

Hilgers

et al. 2007

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

Abstract-In this paper, we present an approach that applies the reinforcement learning principle to the problem of learning height control policies for aerial blimps. In contrast to previous approaches, our method does not require sophisticated handtuned models, but rather learns the policy online, which makes the system easily adaptable to changing conditions. The blimp we apply our approach to is a small-scale vehicle equipped with an ultrasound sensor that measures its elevation relative to the ground. The major problem in the context of learning control policies lies in the high-dimensional state-action space that needs to be explored in order to identify the values of all state-action pairs. In this paper, we propose a solution to learning continuous control policies based on the Gaussian process model. In practical experiments carried out on a real robot we demonstrate that the system is able to learn a policy online within a few minutes only.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: B Learning the Q-functionmentioning

confidence: 99%

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

Rottmann

Plagemann

Hilgers

et al. 2007

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

show abstract

“…The parameters of the value function are usually learned from data, as in the case of incremental TD and the Least-Squares TD (LSTD) methods [4,5]. Also, kernelized reinforcement learning methods have been paid a lot of attention by employing kernel techniques to standard RL methods [6] and Gaussian Processes for approximating the value function [7][8][9].…”

Section: Introductionmentioning

confidence: 99%

An Online Kernel-Based Clustering Approach for Value Function Approximation

Tziortziotis

Blekas

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Value function approximation is a critical task in solving Markov decision processes and accurately representing reinforcement learning agents. A significant issue is how to construct efficient feature spaces from agent's samples in order to obtain optimal policy. This study addresses this challenge by proposing an online kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds new clusters, or adjusts the winning cluster's parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are simultaneously optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.

show abstract

“…Furthermore, Gaussian kernels have 'centers', which alleviates the difficulty of basis subset choice, e.g., uniform allocation (Lagoudakis and Parr 2003) or sample-dependent allocation (Engel et al 2005). In this paper, we therefore define Gaussian kernels on graphs (which we call geodesic Gaussian kernel), and propose using them for value function approximation (see Fig.…”

Section: Introductionmentioning

confidence: 99%

“…Our definition of Gaussian kernels on graphs employs the shortest paths between states rather than the Euclidean distance, which can be computed efficiently using the Dijkstra algorithm (Dijkstra 1959;Fredman and Tarjan 1987). Moreover, an effective use of Gaussian kernels opens up the possibility to exploit the recent advances in using Gaussian processes for temporaldifference learning (Engel et al 2005).…”

Section: Introductionmentioning

confidence: 99%

Geodesic Gaussian kernels for value function approximation

et al. 2008

View full text Add to dashboard Cite

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.The current paper is a complete version of our earlier manuscript (Sugiyama et al. 2007). The major differences are that we included more technical details of the proposed method in Sect. 3, discussions on the relation to related methods in Sect. 4, and the application to map building in Sect. 6. A demo movie of the proposed method applied in simulated robot arm control and Khepera robot navigation is available from

show abstract

Reinforcement learning with Gaussian processes

Cited by 260 publications

References 5 publications

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space

An Online Kernel-Based Clustering Approach for Value Function Approximation

Geodesic Gaussian kernels for value function approximation

Contact Info

Product

Resources

About