2009
DOI: 10.1016/j.neucom.2008.12.019
|View full text |Cite
|
Sign up to set email alerts
|

Gaussian process dynamic programming

Abstract: a b s t r a c tReinforcement learning (RL) and optimal control of systems with continuous states and actions require approximation techniques in most interesting cases. In this article, we introduce Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm. We consider both a classic optimal control problem, where problem-specific prior knowledge is available, and a classic RL problem, where only very general priors can be used. For the classic optimal control problem, GPDP … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
160
0

Year Published

2010
2010
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 166 publications
(161 citation statements)
references
References 37 publications
1
160
0
Order By: Relevance
“…Other algorithms that use GP dynamics models in an RL setup were proposed in [20,8]. In [20,8], value function models have to be maintained, which becomes difficult in higherdimensional state spaces.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Other algorithms that use GP dynamics models in an RL setup were proposed in [20,8]. In [20,8], value function models have to be maintained, which becomes difficult in higherdimensional state spaces.…”
Section: Related Workmentioning
confidence: 99%
“…In [20,8], value function models have to be maintained, which becomes difficult in higherdimensional state spaces. Although the approaches in [20,8] do long-term planning for finding a policy, they cannot directly deal with constraints in the state space (e.g., obstacles).…”
Section: Related Workmentioning
confidence: 99%
“…FQI uses a batch-trained function approximator (FA) as action-value function. Various types of non-linear function approximators have been successfully used with FQI, e.g., Neural Networks [12], Gaussian Processes [2], and others [9]. In this paper, we will use Locally Weighted Projection Regression (LWPR) [15] as the value function approximator of choice, as it is a fast robust online method that can handle large amounts of data.…”
Section: Solving the Pomdpmentioning
confidence: 99%
“…Within this field, active subsampling strategies have been used to select information-rich data through use of information theoretic criteria [8], [3], [25]. Our work is particularly similar to [25], but we exploit the time-sequential nature of laser data (see Section IV-B) to form an exact and inexpensive predictive distribution for use in our decision criterion.…”
Section: Related Workmentioning
confidence: 99%