We study combinatorial problems with real world applications such as machine scheduling, routing, and assignment. We propose a method that combines Reinforcement Learning (RL) and planning. This method can equally be applied to both the offline, as well as online, variants of the combinatorial problem, in which the problem components (e.g., jobs in scheduling problems) are not known in advance, but rather arrive during the decision-making process. Our solution is quite generic, scalable, and leverages distributional knowledge of the problem parameters. We frame the solution process as an MDP, and take a Deep Q-Learning approach wherein states are represented as graphs, thereby allowing our trained policies to deal with arbitrary changes in a principled manner. Though learned policies work well in expectation, small deviations can have substantial negative effects in combinatorial settings. We mitigate these drawbacks by employing our graph-convolutional policies as non-optimal heuristics in a compatible search algorithm, Monte Carlo Tree Search, to significantly improve overall performance. We demonstrate our method on two problems: Machine Scheduling and Capacitated Vehicle Routing. We show that our method outperforms custom-tailored mathematical solvers, state of the art learning-based algorithms, and common heuristics, both in computation time and performance.
Fig. 1. Comparison of the ability of a simulated environment and a real dozer to grade an area studded with piles. Top row: RGB images of our experimental setup (see Section III-C) showing the scaled dozer facing the sand piles. Middle row: Representative height-maps of states in the real environment. Depth observations were projected onto world coordinates using orthographic projections [4]. Bottom row: Similar states observed in the simulated environment. In the middle and bottom rows, the right column is the initial state space, and all the others, states after actions were taken in the simulated and real environments. This figure clearly depicts the resemblance between the simulated and real height-maps in the grading task.
Fig. 1. A comparison of grading policy between a simulated dozer and a real world scaled prototype on an area containing sand piles. Here, the agent is provided with an initial graded area and is required to extend it by pushing newly added sand piles. Top Row: Photos of our experimental setup (see Section IV-A) showing the scaled dozer prototype facing the sand piles. Middle & Bottom Rows: Heightmaps extracted from our simulation and scaled experimental setup respectively. Sand piles are captured as dark blobs that indicate their height. Each column compares the grading policy in the experimental setup and the simulation on a similar scenario.
Surface grading, the process of leveling an uneven area containing pre-dumped sand piles, is an important task in the construction site pipeline. This labour-intensive process is often carried out by a dozer, a key machinery tool at any construction site. Current attempts to automate surface grading assume perfect localization. However, in real-world scenarios, this assumption fails, as agents are presented with imperfect perception, which leads to degraded performance. In this work, we address the problem of autonomous grading under uncertainties. First, we implement a simulation and a scaled real-world prototype environment to enable rapid policy exploration and evaluation in this setting. Second, we formalize the problem as a partially observable markov decision process and train an agent capable of handling such uncertainties. We show, through rigorous experiments, that an agent trained under perfect localization will suffer degraded performance when presented with localization uncertainties. However, an agent trained using our method will develop a more robust policy for addressing such errors and, consequently, exhibit a better grading performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.