Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Jammy, Satya P.; Jacobs, Christian T.; Sandham, Neil D.

doi:10.1016/j.jocs.2016.10.015

Cited by 13 publications

(29 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Validation consisted of running OpenSBLI on known problems, comparing its results to the legacy SBLI application solving the same problem. The correctness of different algorithms on CPUs were previously reported in [22]. In the present work, the min, max difference between the algorithms for all the conservative variables were found to be less than 10 −12 on each architecture, and the difference between the runs on CPU, and GPU are found to be less than 10 −12 for the number of iterations and the optimization options considered here.…”

Section: Performancesupporting

confidence: 70%

“…It uses source-tosource translation to automatically parallelize applications written using this API. OPS is being used to parallelize a number of applications, including hydrodynamics [6], lattice Boltzmann codes [21] and CFD applications [22,23]. Currently supported parallel platforms include distributed memory clusters (using MPI), multi-core CPUs including Intel's Xeon Phi many-core processors (using SIMD, OpenMP, MPI and OpenCL) and GPUs (using CUDA, OpenCL and OpenACC) including clusters of GPUs.…”

Section: Ops (Oxford Parallel Library For Structured Mesh Solvers)mentioning

confidence: 99%

“…extending the data set in the negative x direction). 22 allocated memory */ 23 ops_dat dat3 = ops_decl_dat(B,1, size, base, 24 halo_pos2, halo_neg2, d3, "double", "dat3"); halo_pos3, halo_neg3, d4, "double", "dat4"); ops decl dat(...) declares a dataset on a specific block with a number of data values per data point (1 in this case for all three ops dats) and a size, together with parameters declaring the base index of the data set (i.e. the start index of the actual data), the sizes of the block halos for the data, the initial data values and strings denoting the type of the data and name of the ops dat.…”

Section: Ops Apimentioning

confidence: 99%

“…To reduce computationally expensive divisions, the rational constants, division by constants (like dx0) are evaluated to a constant variable at the start of the simulation. The full list of optimizations performed are given in [22]. For easier debugging the entire algorithm used to generate the code can be written to a LaTeX file.…”

Section: Re-engineering Sbli To Opensblimentioning

confidence: 99%

“…Such an algorithm is generated by setting the store derivatives attribute of the grid class to False. In addition to BL and SN, the ease at which such modifications can be performed is explained in detail by generating several other variations of the solver in [22].…”

Section: Re-computation Of Valuesmentioning

confidence: 99%

See 4 more Smart Citations

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

Mudalige

Reguly

Jammy

et al. 2019

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics(CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundarylayer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domainspecific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3× speedups on CPU nodes, while GPUs provide 5× speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL).

show abstract

Section: Performancesupporting

confidence: 70%

Section: Ops (Oxford Parallel Library For Structured Mesh Solvers)mentioning

confidence: 99%

Section: Ops Apimentioning

confidence: 99%

Section: Re-engineering Sbli To Opensblimentioning

confidence: 99%

Section: Re-computation Of Valuesmentioning

confidence: 99%

See 3 more Smart Citations

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

Mudalige

Reguly

Jammy

et al. 2019

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

show abstract

A Hybrid Vortex Method for the Simulation of 3D Incompressible Flows

Mimeau¹,

Cottet²,

Mortazavi³

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

Shock-wave/boundary-layer interactions in the automatic source-code generation framework OpenSBLI

2018

Self Cite

View full text Add to dashboard Cite

Laminar shock-wave/boundary-layer interactions were simulated using OpenS-BLI, a Python-based source code generation framework. Shock-capturing was performed by a 5 th order finite-difference Weighted Essentially Non-Oscillatory (WENO)-Z scheme applied in characteristic space. Oblique shock conditions were imposed for a shock angle of θ = 32.58 • and Mach 2 free-stream, impinging on a laminar flat-plate boundary-layer. Performance of the code was assessed on different architectures for CPU, GPU and Xeon Phi.

show abstract

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Cited by 13 publications

References 15 publications

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

A Hybrid Vortex Method for the Simulation of 3D Incompressible Flows

Shock-wave/boundary-layer interactions in the automatic source-code generation framework OpenSBLI

Contact Info

Product

Resources

About