2010
DOI: 10.1609/aaai.v24i1.7689
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Sample-Based Planning and Model-Based Reinforcement Learning

Abstract: Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes a near optimal policy, and while many traditional MDP algorithms make this guarantee, their computation time grows with the number of states. We show how to replace these over-matched planners with a class of samp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(16 citation statements)
references
References 15 publications
0
16
0
Order By: Relevance
“…Proof sketch. The proof is identical to that of Proposition 3 by Walsh et al (Walsh, Goschin, and Littman 2010), by noting that every trial of FSSS-Aux has to end at a node with zero bound gap. If such a node is not a leaf, it would not be selected in the first place, because it cannot be the node with widest bound gap.…”
Section: Fsss-aux: Fsss With π-Guided Auxiliary Armsmentioning
confidence: 63%
See 2 more Smart Citations
“…Proof sketch. The proof is identical to that of Proposition 3 by Walsh et al (Walsh, Goschin, and Littman 2010), by noting that every trial of FSSS-Aux has to end at a node with zero bound gap. If such a node is not a leaf, it would not be selected in the first place, because it cannot be the node with widest bound gap.…”
Section: Fsss-aux: Fsss With π-Guided Auxiliary Armsmentioning
confidence: 63%
“…FSSS is the latest successor of SS that is able to combine the performance guarantee of SS with selective sampling (Walsh, Goschin, and Littman 2010). Instead of directly forming point estimates of the Q-and V-values in a look-ahead tree, it maintains interval estimates that allow it then to guide the sampling to the promising and/or unexplored branches of the look-ahead tree based on the widths and upper bounds of the intervals.…”
Section: Forward Search Sparse Sampling (Fsss)mentioning
confidence: 99%
See 1 more Smart Citation
“…Interactive learning has also been posed as an RL problem with an underlying MDP (Sutton and Barto 1998). Approaches for efficient RL in dynamic domains include sample-based planning algorithms (Walsh, Goschin, and Littman 2010), and Relational RL, which uses relational representations and regression for Q-function generalization (Dzeroski, Raedt, and Driessens 2001;Tadepalli, Givan, and Driessens 2004). However, most RRL algorithms focus on planning, limit generalization to a single planning task, or do not support the desired commonsense reasoning capabilities.…”
Section: Related Workmentioning
confidence: 99%
“…Andrew et al 7 introduced shaping function into reinforcement learning and added heuristic value to the returns of agents, which effectively improved the convergence speed. Asmuth et al 8 used potential field function as a priori knowledge to enlighten the reinforcement learning process and proved the effectiveness of the algorithm.…”
Section: Introductionmentioning
confidence: 99%