“…The Monte Carlo simulation itself is another naturally parallel problem as it is the averaging of many random guesses, so it is a kind of analog to the Mandelbrot set generator. It is found that CUDA performs better when transferring data to and from the GPU and that CUDA's kernel execution is also consistently faster than OpenCL, despite the two implementations running nearly identical code [12,13]. Historically, various applications have been studied on shared memory multiprocessors, GPUs, and message passing systems, and their performance evaluated on these systems [17,18,19,20,25,26,27].…”