Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.
Abstract. Exploring an unknown environment with multiple robots requires an efficient coordination method to minimize the total duration. A standard method to discover new areas is to assign frontiers (boundaries between unexplored and explored accessible areas) to robots. In this context, the frontier allocation method is paramount. This paper introduces a decentralized and computationally efficient frontier allocation method favoring a well balanced spatial distribution of robots in the environment. For this purpose, each robot evaluates its relative rank among the other robots in term of travel distance to each frontier. Accordingly, robots are allocated to the frontier for which it has the lowest rank. To evaluate this criteria, a wavefront propagation is computed from each frontier giving an interesting alternative to path planning from robot to frontiers. Comparisons with existing approaches in computerized simulation and on real robots demonstrated the validity and efficiency of our algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.