Generalized Value Iteration Networks: Life Beyond Lattices

Niu, Sufeng; Chen, Siheng; Guo, Hanyu; Targonski, Colin; Smith, Melinda; Kovačević, Jelena

doi:10.48550/arxiv.1706.02416

Cited by 8 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is promising because it can be integrated into a larger differentiable system to form a closed loop., Grimm et al (2020; propose to understand model-based planning algorithms from value equivalence perspective. Value iteration network (VIN) (Tamar et al, 2016) is a representative work that performs value iteration using convolution on lattice grids, and has been further extended (Niu et al, 2017;Lee et al, 2018;Chaplot et al, 2021;Deac et al, 2021). Other than using convolution network, the work on integrating learning and planning into differentiable networks includes (Oh et al, 2017;Karkus et al, 2017;Weber et al, 2018;Srinivas et al, 2018;Schrittwieser et al, 2019;Amos & Yarats, 2019;Wang & Ba, 2019;Hafner et al, 2020;Pong et al, 2018;Clavera et al, 2020).…”

Section: Differentiable Planningmentioning

confidence: 99%

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

Zhao¹,

Xu²,

Wong³

2022

Preprint

View full text Add to dashboard Cite

Differentiable planning promises end-to-end differentiability and adaptivity. However, an issue prevents it from scaling up to larger-scale problems: they need to differentiate through forward iteration layers to compute gradients, which couples forward computation and backpropagation and needs to balance forward planner performance and computational cost of the backward pass. To alleviate this issue, we propose to differentiate through the Bellman fixed-point equation to decouple forward and backward passes for Value Iteration Network and its variants, which enables constant backward cost (in planning horizon) and flexible forward budget and helps scale up to large tasks. We study the convergence stability, scalability, and efficiency of the proposed implicit version of VIN and its variants and demonstrate their superiorities on a range of planning tasks: 2D navigation, visual navigation, and 2-DOF manipulation in configuration space and workspace.

show abstract

Section: Differentiable Planningmentioning

confidence: 99%

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

Zhao¹,

Xu²,

Wong³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For instance, Value Iteration Networks (VINs) are a kind of convolutional neural network that can learn formulate an MDP from an observation of an environment, solve that MDP, and use the result to choose an action (Tamar, Wu, Thomas, Levine, & Abbeel, 2016). Generalised VINs extend this approach to MDPs with more general transition dynamics by employing graph convolutional neural networks instead of ordinary convolutional neural networks (Niu, Chen, Guo, Targonski, Smith, & Kovačević, 2017). In a similar vein, schema networks learn a STRIPS-like environment model using a specially-structured neural network, then choose actions by planning on that learnt model (Kansky, Silver, Mély, Eldawy, Lázaro-Gredilla, Lou, Dorfman, Sidor, Phoenix, & George, 2017).…”

Section: Other Related Planning Workmentioning

confidence: 99%

ASNets: Deep Learning for Generalised Planning

Toyer,

Trevizan,

Thiébaux

et al. 2019

Preprint

View full text Add to dashboard Cite

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce AS-Nets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.

show abstract

“…The integration of planning and neural networks has also been investigated in the context of deep reinforcement learning. For instance, Value Iteration Networks (Tamar et al 2016;Niu et al 2017) (VINs) learn to formulate and solve a probabilistic planning problem within a larger deep neural network. A VIN's internal model can allow it to learn more robust policies than would be possible with ordinary feedforward neural networks.…”

Section: Related Workmentioning

confidence: 99%

Action Schema Networks: Generalised Policies With Deep Learning

Toyer

Trevizan

Thiébaux

et al. 2018

AAAI

View full text Add to dashboard Cite

In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet's learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.

show abstract

Generalized Value Iteration Networks: Life Beyond Lattices

Cited by 8 publications

References 15 publications

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation

ASNets: Deep Learning for Generalised Planning

Action Schema Networks: Generalised Policies With Deep Learning

Contact Info

Product

Resources

About