As classical planning is known to be computationally hard, no single planner is expected to work well across many planning domains. One solution to this problem is to use online portfolio planners that select a planner for a given task. These portfolios perform a classification task, a well-known and wellresearched task in the field of machine learning. The classification is usually performed using a representation of planning tasks with a collection of hand-crafted statistical features. Recent techniques in machine learning that are based on automatic extraction of features have not been employed yet due to the lack of suitable representations of planning tasks.In this work, we alleviate this barrier. We suggest representing planning tasks by images, allowing to exploit arguably one of the most commonly used and best developed techniques in deep learning. We explore some of the questions that inevitably rise when applying such a technique, and present various ways of building practically useful online portfoliobased planners. An evidence of the usefulness of our proposed technique is a planner that won the cost-optimal track of the International Planning Competition 2018.
Automated planning is one of the foundational areas of AI. Since no single planner can work well for all tasks and domains, portfolio-based techniques have become increasingly popular in recent years. In particular, deep learning emerges as a promising methodology for online planner selection. Owing to the recent development of structural graph representations of planning tasks, we propose a graph neural network (GNN) approach to selecting candidate planners. GNNs are advantageous over a straightforward alternative, the convolutional neural networks, in that they are invariant to node permutations and that they incorporate node labels for better inference.Additionally, for cost-optimal planning, we propose a two-stage adaptive scheduling method to further improve the likelihood that a given task is solved in time. The scheduler may switch at halftime to a different planner, conditioned on the observed performance of the first one. Experimental results validate the effectiveness of the proposed method against strong baselines, both deep learning and non-deep learning based.The code is available at https://github.com/matenure/GNN_planner.
How can we train neural network (NN) heuristic functions for classical planning, using only states as the NN input? Prior work addressed this question by (a) per-instance imitation learning and/or (b) per-domain learning. The former limits the approach to instances small enough for training data generation, the latter to domains where the necessary knowledge generalizes across instances. Here we explore three methods for (a) that make training data generation scalable through bootstrapping and approximate value iteration. In particular, we introduce a new bootstrapping variant that estimates search effort instead of goal distance, which as we show converges to the perfect heuristic under idealized circumstances. We empirically compare these methods to (a) and (b), aligning three different NN heuristic function learning architectures for cross-comparison in an experiment of unprecedented breadth in this context. Key lessons are that our methods and imitation learning are highly complementary; that per-instance learning often yields stronger heuristics than per-domain learning; and the LAMA planner is still dominant but our methods outperform it in one benchmark domain.
Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.