2020
DOI: 10.48550/arxiv.2012.13037
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning

Abstract: Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards. However, RL approaches tend to require millions of episodes of experience and often learn policies tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…For example, previous work has considered learning propositional [Zhang et al, 2018, Dittadi et al, 2020, Tsividis, 2019 or lifted [Arora et al, 2018, Asai and Fukunaga, 2018, Asai, 2019, Asai and Muise, 2020, Ames et al, 2018, Ahmetoglu et al, 2020 symbolic transition models, and using them with AI planners [Hoffmann, 2001, Helmert, 2006. Other related work has used symbolic planners as managers in hierarchical RL, where low-level option policies are learned [Lyu et al, 2019, Sarathy et al, 2020, Gordon et al, 2019, Illanes et al, 2020, Yang et al, 2018, Kokel et al, 2021. In contrast to all these, we are focused on robotic settings where the planner must handle geometric considerations in addition to the symbolic ones.…”
Section: Related Workmentioning
confidence: 99%
“…For example, previous work has considered learning propositional [Zhang et al, 2018, Dittadi et al, 2020, Tsividis, 2019 or lifted [Arora et al, 2018, Asai and Fukunaga, 2018, Asai, 2019, Asai and Muise, 2020, Ames et al, 2018, Ahmetoglu et al, 2020 symbolic transition models, and using them with AI planners [Hoffmann, 2001, Helmert, 2006. Other related work has used symbolic planners as managers in hierarchical RL, where low-level option policies are learned [Lyu et al, 2019, Sarathy et al, 2020, Gordon et al, 2019, Illanes et al, 2020, Yang et al, 2018, Kokel et al, 2021. In contrast to all these, we are focused on robotic settings where the planner must handle geometric considerations in addition to the symbolic ones.…”
Section: Related Workmentioning
confidence: 99%
“…The process of collecting experiences for learning is timeconsuming and the sample efficiency is low. To alleviate those issues, researchers have investigated the combination of HRL and symbolic planning to improve transferability, interpretability, and data efficiency (Ryan 2002;Leonetti, Iocchi, and Stone 2016;Yang et al 2018;Lyu et al 2019;Illanes et al 2020;Sarathy et al 2020;Lee et al 2021). In those works, the original MDP is divided into two levels.…”
Section: Introductionmentioning
confidence: 99%
“…A more realistic idea is to automatically learn action models from training data (Zhuo and Kambhampati 2017;Yang, Wu, and Jiang 2007;Ng and Petrick 2019;Martínez et al 2016;James, Rosman, and Konidaris 2020) and exploit the learnt action models to generate plans for guiding the exploration of options. Although there is indeed an approach (Sarathy et al 2020) proposed to learn action models automatically, they still need to manually define major parts of action models in advance. Besides, the planning goal in this approach is kept unchanged while in our framework it is dynamically adapted to maximize the external reward.…”
Section: Introductionmentioning
confidence: 99%
“…To alleviate those issues, researchers have investigated the combination of h-DRL and symbolic planning to improve transferability, interpretability, and data efficiency (Parr and Russell 1997;Ryan 2002;Hogg, Kuter, and Muñoz-Avila 2010;Leonetti, Iocchi, and Stone 2016;Yang et al 2018;Lyu et al 2019;Illanes et al 2020;Sarathy et al 2020). In this structure, the original MDP is divided into two levels.…”
Section: Introductionmentioning
confidence: 99%
“…A more realistic idea is to automatically learn action models from training data (Zhuo and Kambhampati 2017;Yang, Wu, and Jiang 2007;Ng and Petrick 2019;Martínez et al 2016;James, Rosman, and Konidaris 2020) and exploit the learnt action models to generate plans for guiding the exploration of options. Although there is indeed an approach (Sarathy et al 2020) proposed to learn action models automatically, they still need to manually define a major part of the models in advance. Besides, the planning goal in this approach is kept unchanged while it is dynamically adapted to maximize the external reward in our framework.…”
Section: Introductionmentioning
confidence: 99%