2019
DOI: 10.48550/arxiv.1906.05253
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Abstract: The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and the relative values of states, but fails to plan over long horizons. Despite the successes of each method in various domains, tasks that require reasoning over long hor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 8 publications
(20 citation statements)
references
References 39 publications
0
20
0
Order By: Relevance
“…This can be a limitation in practice, especially in robotic domains, as any interaction with the environment requires robot time, and exploring a new environment can be challenging (indeed, Savinov et al 2018 applied manual exploration). In addition, similarly to Eysenbach et al (2019), we found that training the connectivity classifier as proposed by Savinov et al (2018) requires extensive hyperparameter tuning.…”
Section: Hallucinative Topological Memorymentioning
confidence: 84%
See 2 more Smart Citations
“…This can be a limitation in practice, especially in robotic domains, as any interaction with the environment requires robot time, and exploring a new environment can be challenging (indeed, Savinov et al 2018 applied manual exploration). In addition, similarly to Eysenbach et al (2019), we found that training the connectivity classifier as proposed by Savinov et al (2018) requires extensive hyperparameter tuning.…”
Section: Hallucinative Topological Memorymentioning
confidence: 84%
“…Additionally, similar to (Eysenbach et al, 2019), we find that training the graph connectivity classifier as originally proposed by (Savinov et al, 2018) requires extensive manual tuning. We replace the vanilla classifier used in SPTM with an energy-based model that employs a contrastive loss.…”
Section: Introductionmentioning
confidence: 85%
See 1 more Smart Citation
“…Pathak et al (2018) use an inverse model with forward consistency to learn from novelty seeking behavior, but lacks convergence guarantees and requires learning a complex inverse model. Semi-parametric methods (Savinov et al, 2018;Eysenbach et al, 2019) learn a policy similar to ours but do so by building a connectivity graph over the visited states in order to navigate environments, which requires large memory storage and computation time that increases with the number of states.…”
Section: Related Workmentioning
confidence: 99%
“…We experimentally compare to RL-based distance learning methods, and show that DDL attains substantially better results, especially with complex observations. Another line of prior work uses a learned distance to build a search graph over a set of visited states (Savinov et al, 2018;Eysenbach et al, 2019), which can then be used to plan to reach new states via the shortest path.…”
Section: Related Workmentioning
confidence: 99%