Computing the interactions between the stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. However, simulating realistic sized clusters of about 10 6 stars is computationally intensive and often takes a long time to complete. This paper presents the parallelization of a Monte Carlo method-based algorithm for simulating stellar cluster evolution on programmable Graphics Processing Units (GPUs). The kernels of this algorithm involve numerical methods of root-bisection and von Neumann rejection. Our experiments show that although these kernels exhibit data dependent decision making and unavoidable non-contiguous memory accesses, the GPU can still deliver substantial near-linear speed-ups which is unlikely to be achieved on a CPU-based system. For problem sizes ranging from 10 6 to 7 × 10 6 stars, we obtain up to 28× speedups for these kernels, and a 2× overall application speedup on an NVIDIA GTX280 GPU over the sequential version run on an AMD c Phenom TM Quad-Core Processor.
General Terms
PERFORMANCE
KeywordsGraphics processing unit (GPU), CUDA, Monte Carlo simulation, bisection method, parallel random number generator, multi-scale simulation.