Abstract-We present an algorithm for generating openloop trajectories that solve the problem of rearrangement planning under uncertainty. We frame this as a selection problem where the goal is to choose the most robust trajectory from a finite set of candidates. We generate each candidate using a kinodynamic state space planner and evaluate it using noisy rollouts.Our key insight is we can formalize the selection problem as the "best arm" variant of the multi-armed bandit problem. We use the successive rejects algorithm to efficiently allocate rollouts between candidate trajectories given a rollout budget. We show that the successive rejects algorithm identifies the best candidate using fewer rollouts than a baseline algorithm in simulation. We also show that selecting a good candidate increases the likelihood of successful execution on a real robot.
I. IntroductionWe explore the rearrangement planning problem [11] where a robot must rearrange several objects in a cluttered environment to achieve a goal ( Fig.1-Top). Recent work has used simple physics models, like quasistatic pushing [8,9,22,32,39,43], to quickly produce efficient and intricate plans. However, the intricacy of these plans makes them particularly sensitive to uncertainty in object pose, physics parameters, and trajectory execution.We address this problem by generating open-loop trajectories that are robust to uncertainty. However, this is particularly hard for rearrangement planning.First, the problem is set in a high-dimensional space with continuous actions. Second, contact causes physics to evolve in complex, non-linear ways and quickly leads to multi-modal and non-smooth distributions [25,26,34]. Third, finding good trajectories is inherently hard: most trajectories achieve success with zero probability.As a consequence, this problem lies outside the domains of standard conformant planning [19,37], POMDP [20] and policy search [7] algorithms.In response to these challenges, we propose a domain-agnostic algorithm that only requires: (1) a stochastic method of generating trajectories, (2) the ability to forward-simulate the system's dynamics, and (3) the capability of testing whether an execution is successful.Exploiting the fact that we can quickly generate feasible state space trajectories [22], we formulate rearrangement planning under uncertainty (sections III and IV)