2013
DOI: 10.1016/j.parco.2013.09.004
|View full text |Cite
|
Sign up to set email alerts
|

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
23
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(25 citation statements)
references
References 31 publications
2
23
0
Order By: Relevance
“…Race conditions that occur during shared-memory execution on both CPUs (OpenMP) and GPUs (CUDA) are handled through multiple levels of coloring while for the distributed memory (MPI) parallelization, an owner-compute model [12] similar to that used in OPlus is implemented. More details on the various parallelization strategies and their performance implications can be found in previous papers [8], [9], [10], [12], and will be discussed in more detail in the subsequent sections, as we describe optimizations applied to them.…”
Section: Development and Code Generation With Op2mentioning
confidence: 99%
See 1 more Smart Citation
“…Race conditions that occur during shared-memory execution on both CPUs (OpenMP) and GPUs (CUDA) are handled through multiple levels of coloring while for the distributed memory (MPI) parallelization, an owner-compute model [12] similar to that used in OPlus is implemented. More details on the various parallelization strategies and their performance implications can be found in previous papers [8], [9], [10], [12], and will be discussed in more detail in the subsequent sections, as we describe optimizations applied to them.…”
Section: Development and Code Generation With Op2mentioning
confidence: 99%
“…The generated code and the OP2 platform specific back-end libraries are highly optimized utilizing the best low-level features of a target architecture to make an OP2 application achieve high performance including high computational efficiency and minimized memory traffic. In previous works, we have presented OP2's design and development [8], [9] and its performance on heterogeneous systems [10], [12] on simpler benchmarks, and demonstrated considerable performance gains on a diverse set of hardware.…”
Section: Introductionmentioning
confidence: 99%
“…With OP2 we demonstrated that both developer productivity as well as near-optimal performance could be achieved on a wide range of parallel hardware. Research published as a result of this work includes a number of performance analysis studies on standard CFD benchmark applications [23] as well as a full industrial-scale application from the production work-load at Rolls-Royce plc. [28].…”
Section: Opsmentioning
confidence: 99%
“…The build process to obtain a parallel executable as illustrated in Figure 4 follows that of OP2's code generation process [23]. The API calls in the application are parsed by the OPS source-to-source translator which will produce a modified main program and back-end specific code.…”
Section: Porting Cloverleaf To Opsmentioning
confidence: 99%
“…HeDCS is generally used in scientific and commercial applications including real time safety -critical application [2]. They are adopted by many High Performance Computing (HPC) system designers and vendors [3]. Different architectures for heterogeneous computing consist of mainframe with integrated vector unit, vector processors, attached processors, Multiprocessor and Multi computing systems and special purpose architectures [4].Heterogeneity can be proposed in two types, namely, Function-level heterogeneity and Performance -level heterogeneity [5].In function -level heterogeneity, systems combine generalpurpose processors with special-purpose processors, such as vector units, floating -point co -processors, and input/output processors.…”
Section: Introduction 11heterogeneous Systemsmentioning
confidence: 99%