Parallel and Distributed Computing and Systems 2011
DOI: 10.2316/p.2011.757-018
|View full text |Cite
|
Sign up to set email alerts
|

An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application

Abstract: Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver using finite-differencing). OpenCL is the only vendor-agnostic and multi-platform parallel computing framework that has … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…It seems likely that the embarrassingly parallel nature of the effective source calculation on a grid of points is an ideal candidate for implementation in a GPU programming framework such as CUDA or OpenCL. Given other applications have seen speed-ups by 1 to 2 orders of magnitude [73], it is not unreasonable to expect similar performance gains for effective source calculations.…”
Section: Discussion and Summarymentioning
confidence: 98%
“…It seems likely that the embarrassingly parallel nature of the effective source calculation on a grid of points is an ideal candidate for implementation in a GPU programming framework such as CUDA or OpenCL. Given other applications have seen speed-ups by 1 to 2 orders of magnitude [73], it is not unreasonable to expect similar performance gains for effective source calculations.…”
Section: Discussion and Summarymentioning
confidence: 98%
“…In this section we detail our approach taken towards parallelism, not only to take advantage of the many cores of a single GPU, but also those on multiple GPUs. We describe here the different ideas we have implemented and their final performance outcomes [13,14]. The lessons learned have ultimately helped us converge towards a rather optimal implementation.…”
Section: Code Implementationmentioning
confidence: 99%
“…In addition, it is necessary to establish the appropriate data communication between the GPU cores and the remaining code that is executing on the CPU -we use clEnqueueReadBuffer, clEnqueueWriteBuffer instructions to transfer data backand-forth from main memory and we only use global memory on the GPU to simplify communication between the GPU cores. We make this simplification with the goal of keeping the code's portability intact, even if it impacts performance to some extent [14].…”
Section: Code Implementationmentioning
confidence: 99%