An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Choudhary, Niket K.; Navada, Sandeep; Ginjupalli, Rakesh; Khanna, Gaurav

doi:10.2316/p.2011.757-018

Cited by 3 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It seems likely that the embarrassingly parallel nature of the effective source calculation on a grid of points is an ideal candidate for implementation in a GPU programming framework such as CUDA or OpenCL. Given other applications have seen speed-ups by 1 to 2 orders of magnitude [73], it is not unreasonable to expect similar performance gains for effective source calculations.…”

Section: Discussion and Summarymentioning

confidence: 98%

Generic effective source for scalar self-force calculations

et al. 2012

View full text Add to dashboard Cite

A leading approach to the modeling of extreme mass ratio inspirals involves the treatment of the smaller mass as a point particle and the computation of a regularized self-force acting on that particle. In turn, this computation requires knowledge of the regularized retarded field generated by the particle. A direct calculation of this regularized field may be achieved by replacing the point particle with an effective source and solving directly a wave equation for the regularized field. This has the advantage that all quantities are finite and require no further regularization. In this work, we present a method for computing an effective source which is finite and continuous everywhere, and which is valid for a scalar point particle in arbitrary geodesic motion in an arbitrary background spacetime. We explain in detail various technical and practical considerations that underlie its use in several numerical self-force calculations. We consider as examples the cases of a particle in a circular orbit about Schwarzschild and Kerr black holes, and also the case of a particle following a generic timelike geodesic about a highly spinning Kerr black hole. We provide numerical C code for computing an effective source for various orbital configurations about Schwarzschild and Kerr black holes.

show abstract

Section: Discussion and Summarymentioning

confidence: 98%

Generic effective source for scalar self-force calculations

et al. 2012

View full text Add to dashboard Cite

show abstract

“…In this section we detail our approach taken towards parallelism, not only to take advantage of the many cores of a single GPU, but also those on multiple GPUs. We describe here the different ideas we have implemented and their final performance outcomes [13,14]. The lessons learned have ultimately helped us converge towards a rather optimal implementation.…”

Section: Code Implementationmentioning

confidence: 99%

“…In addition, it is necessary to establish the appropriate data communication between the GPU cores and the remaining code that is executing on the CPU -we use clEnqueueReadBuffer, clEnqueueWriteBuffer instructions to transfer data backand-forth from main memory and we only use global memory on the GPU to simplify communication between the GPU cores. We make this simplification with the goal of keeping the code's portability intact, even if it impacts performance to some extent [14].…”

Section: Code Implementationmentioning

confidence: 99%

High accuracy gravitational waveforms from black hole binary inspirals using OpenCL

McKennon

Forrester

Khanna

2012

Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging From the eXtreme to Th

Self Cite

View full text Add to dashboard Cite

There is a strong need for high-accuracy and efficient modeling of extreme-mass-ratio binary black hole systems because these are strong sources of gravitational waves that would be detected by future observatories. In this article, we present sample results from our Teukolsky EMRI code: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver using finite-differencing), that takes advantage of several mathematical and computational enhancements to efficiently generate long-duration and high-accuracy EMRI waveforms.We emphasize here the computational advances made in the context of this code. Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. Our code uses the Open Computing Language (OpenCL) for taking advantage of the massive parallelism offered by modern GPU architectures. We present the performance of our Teukolsky EMRI code on multiple modern processor architectures and demonstrate the high level of accuracy and performance it is able to achieve. We also present the code's scaling performance on a large supercomputer i.e. NSF's XSEDE resource, Keeneland 1 .

show abstract