Gaussian process dynamic programming

Deisenroth, Marc Peter; Rasmussen, Carl Edward; Peters, Jan

doi:10.1016/j.neucom.2008.12.019

Cited by 166 publications

(161 citation statements)

References 37 publications

Supporting

Mentioning

160

Contrasting

Order By: Relevance

“…Other algorithms that use GP dynamics models in an RL setup were proposed in [20,8]. In [20,8], value function models have to be maintained, which becomes difficult in higherdimensional state spaces.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Deisenroth

Rasmussen

Fox

2011

Robotics: Science and Systems VII

200

182

View full text Add to dashboard Cite

Abstract-Over the last years, there has been substantial progress in robust manipulation in unstructured environments. The long-term goal of our work is to get away from precise, but very expensive robotic systems and to develop affordable, potentially imprecise, self-adaptive manipulator systems that can interactively perform tasks such as playing with children. In this paper, we demonstrate how a low-cost off-the-shelf robotic system can learn closed-loop policies for a stacking task in only a handful of trials-from scratch. Our manipulator is inaccurate and provides no pose feedback. For learning a controller in the work space of a Kinect-style depth camera, we use a model-based reinforcement learning technique. Our learning method is data efficient, reduces model bias, and deals with several noise sources in a principled way during long-term planning. We present a way of incorporating state-space constraints into the learning process and analyze the learning gain by exploiting the sequential structure of the stacking task.

show abstract

Section: Related Workmentioning

confidence: 99%

“…In [20,8], value function models have to be maintained, which becomes difficult in higherdimensional state spaces. Although the approaches in [20,8] do long-term planning for finding a policy, they cannot directly deal with constraints in the state space (e.g., obstacles).…”

Section: Related Workmentioning

confidence: 99%

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Deisenroth

Rasmussen

Fox

2011

Robotics: Science and Systems VII

200

182

View full text Add to dashboard Cite

show abstract

“…FQI uses a batch-trained function approximator (FA) as action-value function. Various types of non-linear function approximators have been successfully used with FQI, e.g., Neural Networks [12], Gaussian Processes [2], and others [9]. In this paper, we will use Locally Weighted Projection Regression (LWPR) [15] as the value function approximator of choice, as it is a fast robust online method that can handle large amounts of data.…”

Section: Solving the Pomdpmentioning

confidence: 99%

Sequential Feature Selection for Classification

Rückstieß

Osendorfer

Smagt

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. In most real-world information processing problems, data is not a free resource; its acquisition is rather time-consuming and/or expensive. We investigate how these two factors can be included in supervised classification tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning. Our method performs a sequential feature selection that learns which features are most informative at each timestep, choosing the next feature depending on the already selected features and the internal belief of the classifier. Experiments on a handwritten digits classification task show significant reduction in required data for correct classification, while a medical diabetes prediction task illustrates variable feature cost minimization as a further property of our algorithm.

show abstract

“…Within this field, active subsampling strategies have been used to select information-rich data through use of information theoretic criteria [8], [3], [25]. Our work is particularly similar to [25], but we exploit the time-sequential nature of laser data (see Section IV-B) to form an exact and inexpensive predictive distribution for use in our decision criterion.…”

Section: Related Workmentioning

confidence: 99%

Efficient Non-Parametric Surface Representations Using Active Sampling for Push Broom Laser Data

Smith

Posner

Newman

2010

Robotics: Science and Systems VI

View full text Add to dashboard Cite

Abstract-This paper concerns the creation of efficient surface representations from laser point clouds. We produce a continuous, implicit, non-parametric representation with an update time that is constant. The algorithm places no restriction on the complexity of the underlying workspace surfaces and automatically prunes redundant data via an information theoretic criterion. This criterion makes the use of Gaussian Process regression a natural choice. We adopt a formulation which handles the typical non-functional relation between XY-location and elevation allowing us to map arbitrary environments. Results are presented that use real and synthetic data to analyse the trade-off between compression level and reconstruction error. We attain decimation factors in excess of two orders of magnitude without significant degradation in fidelity.

show abstract

Gaussian process dynamic programming

Cited by 166 publications

References 37 publications

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning

Sequential Feature Selection for Classification

Efficient Non-Parametric Surface Representations Using Active Sampling for Push Broom Laser Data

Contact Info

Product

Resources

About