2019
DOI: 10.1609/aaai.v33i01.33013494
|View full text |Cite
|
Sign up to set email alerts
|

How to Combine Tree-Search Methods in Reinforcement Learning

Abstract: Finite-horizon lookahead policies are abundantly used in Reinforcement Learning and demonstrate impressive empirical success. Usually, the lookahead policies are implemented with specific planning methods such as Monte Carlo Tree Search (e.g. in AlphaZero (Silver et al. 2017b)). Referring to the planning problem as tree search, a reasonable practice in these implementations is to back up the value only at the leaves while the information obtained at the root is not leveraged other than for updating the policy.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
35
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(36 citation statements)
references
References 9 publications
1
35
0
Order By: Relevance
“…Our experiment is even more significant considering that Efroni et al [14] recently proved that the Bellman update should be replaced so that contraction is guaranteed for tree-based policies only when the value at the leaves is backed. However, this theory was not supported by empirical evidence beyond a toy maze.…”
Section: Training With Tree Searchmentioning
confidence: 99%
See 2 more Smart Citations
“…Our experiment is even more significant considering that Efroni et al [14] recently proved that the Bellman update should be replaced so that contraction is guaranteed for tree-based policies only when the value at the leaves is backed. However, this theory was not supported by empirical evidence beyond a toy maze.…”
Section: Training With Tree Searchmentioning
confidence: 99%
“…We find this method to be beneficial in several of the games we tested. In the experiments below, we treat the correction from [14] as a hyper-parameter and include ablation studies of it in Appendix C.3.…”
Section: Training With Tree Searchmentioning
confidence: 99%
See 1 more Smart Citation
“…• Multi-step approximate dynamic programming: More complex integrations use a form a multi-step approximate dynamic programming (Efroni et al, 2019(Efroni et al, , 2018.…”
Section: Model-based Reinforcement Learningmentioning
confidence: 99%
“…Several recent works rigorously analyzed the properties of multi-step lookahead in common RL schemes (Efroni et al, 2018a(Efroni et al, ,b, 2019(Efroni et al, , 2020Hallak et al, 2021). This and other related literature studied a fixed planning horizon chosen in advance.…”
Section: Introductionmentioning
confidence: 99%