Accurate robot localization and mapping can be improved through the adoption of globally optimal registration methods, like the Angular Radon Spectrum (ARS). In this paper, we present Cud-ARS, an efficient variant of the ARS algorithm for 2D registration designed for parallel execution of the most computationally expensive steps on Nvidia™ Graphics Processing Units (GPUs). Cud-ARS is able to compute the ARS in parallel blocks, with each associated to a subset of input points. We also propose a global branch-and-bound method for translation estimation. This novel parallel algorithm has been tested on multiple datasets. The proposed method is able to speed up the execution time by two orders of magnitude while obtaining more accurate results in rotation estimation than state-of-the-art correspondence-based algorithms. Our experiments also assess the potential of this novel approach in mapping applications, showing the contribution of GPU programming to efficient solutions of robotic tasks.