2013
DOI: 10.1016/j.jpdc.2012.07.008
|View full text |Cite
|
Sign up to set email alerts
|

Designing OP2 for GPU architectures

Abstract: OP2 is an "active" library framework for the solution of unstructured mesh applications. It aims to decouple the specification of a scientific application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the current OP2 library for generating efficient code targeting contemporary GPU platforms. In this we focus on some of the software architecture design choices… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 28 publications
(39 citation statements)
references
References 10 publications
0
38
1
Order By: Relevance
“…The ability to adapt to the rapidly changing hardware landscape motivated the development of OP2 [8], [9], a successor to OPlus. While the initial motivation was to enable Hydra to exploit multi-core and many-core parallelism, OP2 was designed from the outset to be a general high-level active library framework to express and parallelize unstructured mesh based numerical computations.…”
Section: Op2 Library For Unstructured Gridsmentioning
confidence: 99%
See 3 more Smart Citations
“…The ability to adapt to the rapidly changing hardware landscape motivated the development of OP2 [8], [9], a successor to OPlus. While the initial motivation was to enable Hydra to exploit multi-core and many-core parallelism, OP2 was designed from the outset to be a general high-level active library framework to express and parallelize unstructured mesh based numerical computations.…”
Section: Op2 Library For Unstructured Gridsmentioning
confidence: 99%
“…OP2 holds them internally as C arrays and it is able to apply optimizing transformations in how the data is held in memory. Transformations include reordering mesh elements [16], partitioning (under MPI) and conversion to an array-ofstructs data layout (for GPUs [9]). These transformations, and OP2's ability to seamlessly apply them internally is key to achieving a number of performance optimizations.…”
Section: Development and Code Generation With Op2mentioning
confidence: 99%
See 2 more Smart Citations
“…Results presented in previous papers [4,6] report pure MPI and hybrid MPI+OpenMP performance on clusters of CPUs as well as MPI+CUDA performance results running on Fermi-generation NVIDIA GPUs. However, as newer hardware generations become available, it is necessary to revise optimization techniques due to changing performance characteristics and best practices; for example the Kepler generation of GPUs features a much higher number of cores per Scalar Multiprocessor (SMX) than the Fermi generation, but the amount of shared memory available remains unchanged.…”
Section: Optimizations To Existing Backendsmentioning
confidence: 99%