State-of-the-art sampling-based online POMDP solvers compute near-optimal policies for POMDPs with very large state spaces. However, when faced with large observation spaces, they may become overly optimistic and compute suboptimal policies, because of particle divergence. This paper presents a new online POMDP solver DESPOT-α, which builds upon the widely used DESPOT solver. DESPOT-α improves the practical performance of online planning for POMDPs with large observation as well as state spaces. Like DESPOT, DESPOTα uses the particle belief approximation and searches a determinized sparse belief tree. To tackle large observation spaces, DESPOT-α shares sub-policies among many observations during online policy computation. The value function of a sub-policy is a linear function of the belief, commonly known as α-vector. We introduce a particle approximation of the α-vector to improve the efficiency of online policy search. We further speed up DESPOTα using CPU and GPU parallelization ideas introduced in HyP-DESPOT. Experimental results show that DESPOT-α/HyP-DESPOT-α outperform DESPOT/HyP-DESPOT on POMDPs with large observation spaces, including a complex simulation task involving an autonomous vehicle driving among many pedestrians.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.