2020
DOI: 10.48550/arxiv.2010.13146
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

XLVIN: eXecuted Latent Value Iteration Nets

Abstract: Value Iteration Networks (VINs) have emerged as a popular method to perform implicit planning within deep reinforcement learning, enabling performance improvements on tasks requiring long-range reasoning and understanding of environment dynamics. This came with several limitations, however: the model is not explicitly incentivised to perform meaningful planning computations, the underlying state space is assumed to be discrete, and the Markov decision process (MDP) is assumed fixed and known. We propose eXecut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…As in other areas of machine learning, RL has seen increasing interest in forgoing the use of explicit models, instead structuring the policy to include a planning inductive bias such that an agent can perform implicit planning (Tamar et al, 2016;Deac et al, 2020;Amos et al, 2018;Jin et al, 2020). A classic example is value iteration networks (Tamar et al, 2016), which replace the explicit value iteration algorithm with an inductive bias in the form of a convolutional neural network (Fukushima, 1988;LeCun et al, 1989).…”
Section: End-to-end Sysid and Controlmentioning
confidence: 99%
“…As in other areas of machine learning, RL has seen increasing interest in forgoing the use of explicit models, instead structuring the policy to include a planning inductive bias such that an agent can perform implicit planning (Tamar et al, 2016;Deac et al, 2020;Amos et al, 2018;Jin et al, 2020). A classic example is value iteration networks (Tamar et al, 2016), which replace the explicit value iteration algorithm with an inductive bias in the form of a convolutional neural network (Fukushima, 1988;LeCun et al, 1989).…”
Section: End-to-end Sysid and Controlmentioning
confidence: 99%
“…Through the lens of algorithmic alignment (Xu et al, 2019), GNNs can be constructed that closely mimic iterative computation (Veličković et al, 2019;, linearithmic sequence processing (Freivalds et al, 2019), and pointer-based data structures . Also such approaches are capable of strongly generalising (Yan et al, 2020) and data-efficient planning (Deac et al, 2020).…”
Section: Introductionmentioning
confidence: 99%

Persistent Message Passing

Strathmann,
Barekatain,
Blundell
et al. 2021
Preprint
Self Cite
“…Specifically, the XLVIN architecture [Deac et al, 2020] is an exact instance of our blueprint for the VI algorithm. Besides improved data efficiency over more traditional approaches to RL, it also compared favourably against ATreeC [Farquhar et al, 2017], which attempts to directly apply VI in a neural pipeline, thus encountering the algorithmic bottleneck problem in low-data regimes.…”
mentioning
confidence: 99%

Neural Algorithmic Reasoning

Veličković,
Blundell
2021
Preprint
Self Cite