2019
DOI: 10.48550/arxiv.1907.00664
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Wenling Shang,
Alex Trott,
Stephan Zheng
et al.

Abstract: In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train a latent pivotal state model and a curiosity-driven goal-conditioned policy in a task-agnostic manner. Second, pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 52 publications
0
7
0
Order By: Relevance
“…Other examples of hybrid symbolic and sub-symbolic methods where a knowledge-base tool or graph-perspective enhances the neural (e.g., language [308]) model are in [309,310]. In reinforcement learning, very few examples of symbolic (graphical [311] or relational [75,312]) hybrid models exist, while in recommendation systems, for instance, explainable autoencoders are proposed [313].…”
Section: Prediction Explanationmentioning
confidence: 99%
See 1 more Smart Citation
“…Other examples of hybrid symbolic and sub-symbolic methods where a knowledge-base tool or graph-perspective enhances the neural (e.g., language [308]) model are in [309,310]. In reinforcement learning, very few examples of symbolic (graphical [311] or relational [75,312]) hybrid models exist, while in recommendation systems, for instance, explainable autoencoders are proposed [313].…”
Section: Prediction Explanationmentioning
confidence: 99%
“…Attention Networks [267,278,330,331,332,333,334] Representation Disentanglement [113,279,335,336,337,338,339,340,341,342] Explanation Generation [276,343,344,345] Hybrid Transparent and Black-box Methods Neural-symbolic Systems [297,298,299,300] KB-enhanced Systems [24,169,301,308,309,310] Deep Formulation [264,302,303,304,305] Relational Reasoning [75,312,313,314] Case-base Reasoning [316,317,318] Figure 11: (a) Alternative Deep Learning specific taxonomy extended from the categorization from [13]; and (b) its connection to the taxonomy in Figure 6.…”
Section: Explanation Of Deep Network Representationmentioning
confidence: 99%
“…Keramati et al (2018) propose a model-based framework to solve sparsereward domains, and incorporate macro-actions in the form of fixed action sequences that can be selected as a single decision. Shang et al (2019) use variational inference to construct a world graph similar to our region space. However, unlike our model-free method, the option policies are trained using dynamic programming, which requires knowledge of the environment dynamics.…”
Section: Related Workmentioning
confidence: 99%
“…planning in model-based reinforcement learning) as the same future point in time can be reached with shorter and more accurate state rollouts (Pertsch et al, 2020;Zakharov et al, 2021). These benefits have recently prompted a number of proposed models that either perform transitions with temporal jumps of arbitrary length (Koutnik et al, 2014;Saxena et al, 2021), or aim to identify significant events (or key-frames) in sequential data and model the transitions between these events (Chung et al, 2017;Neitz et al, 2018;Jayaraman et al, 2018;Shang et al, 2019;Kipf et al, 2019;Kim et al, 2019;Pertsch et al, 2020;Zakharov et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…The large variety of different event criteria defined in these studies demonstrates the lack of a widely established definition of this concept in this literature. For instance, important events are either selected as the points in time that contain maximum information about a full video sequence (Pertsch et al, 2020), about the agent's actions (Shang et al, 2019), as the most predictable (Neitz et al, 2018;Jayaraman et al, 2018) or as the most surprising (Zakharov et al, 2021) points in time. A clearer picture can be drawn when viewing events from the perspective of cognitive psychology, where events are defined as segments of time "conceived by an observer to have a beginning and an end" (Zacks & Tversky, 2001).…”
Section: Introductionmentioning
confidence: 99%