2013
DOI: 10.1002/acs.2387
|View full text |Cite
|
Sign up to set email alerts
|

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Abstract: Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(18 citation statements)
references
References 48 publications
0
18
0
Order By: Relevance
“…However, OLT policies differ from LT ones in one significant way: whereas LT uses a generic node expansion heuristic, OLT relies on a parameterized node expansion heuristic exp score(n; θ) where the parameters θ are specifically optimized in an offline learning phase for the given target domain (f, ̺). The main advantage of OLT over LT is that this optimization can lead to a substantial reduction of the number of node expansions necessary to output good control actions (as was empirically demonstrated in [7], [9]), meaning that OLT can achieve the same performance as LT at a significantly lower online cost. (The disadvantage of OLT is of course that it needs this prior offline learning.)…”
Section: Optimized Look-ahead Tree Policiesmentioning
confidence: 96%
See 4 more Smart Citations
“…However, OLT policies differ from LT ones in one significant way: whereas LT uses a generic node expansion heuristic, OLT relies on a parameterized node expansion heuristic exp score(n; θ) where the parameters θ are specifically optimized in an offline learning phase for the given target domain (f, ̺). The main advantage of OLT over LT is that this optimization can lead to a substantial reduction of the number of node expansions necessary to output good control actions (as was empirically demonstrated in [7], [9]), meaning that OLT can achieve the same performance as LT at a significantly lower online cost. (The disadvantage of OLT is of course that it needs this prior offline learning.)…”
Section: Optimized Look-ahead Tree Policiesmentioning
confidence: 96%
“…Having made this distinction, we can characterize DPS and LT as lying at opposite ends of the offline complexity / online complexity spectrum [7]. DPS techniques typically require huge offline resources for two reasons.…”
Section: Goal: Constrained Online Budgetmentioning
confidence: 99%
See 3 more Smart Citations