2019
DOI: 10.1016/j.jocs.2016.10.015
|View full text |Cite
|
Sign up to set email alerts
|

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Abstract: Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by varying degrees of memory and computational intensity, are evaluated in the context of finite difference methods for fluid dynamics problems. It is shown that, by storing some of the evaluated derivatives as single thread-or process-local variables in memory, or recomputing the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
24
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(29 citation statements)
references
References 15 publications
2
24
0
Order By: Relevance
“…Validation consisted of running OpenSBLI on known problems, comparing its results to the legacy SBLI application solving the same problem. The correctness of different algorithms on CPUs were previously reported in [22]. In the present work, the min, max difference between the algorithms for all the conservative variables were found to be less than 10 −12 on each architecture, and the difference between the runs on CPU, and GPU are found to be less than 10 −12 for the number of iterations and the optimization options considered here.…”
Section: Performancesupporting
confidence: 70%
See 4 more Smart Citations
“…Validation consisted of running OpenSBLI on known problems, comparing its results to the legacy SBLI application solving the same problem. The correctness of different algorithms on CPUs were previously reported in [22]. In the present work, the min, max difference between the algorithms for all the conservative variables were found to be less than 10 −12 on each architecture, and the difference between the runs on CPU, and GPU are found to be less than 10 −12 for the number of iterations and the optimization options considered here.…”
Section: Performancesupporting
confidence: 70%
“…It uses source-tosource translation to automatically parallelize applications written using this API. OPS is being used to parallelize a number of applications, including hydrodynamics [6], lattice Boltzmann codes [21] and CFD applications [22,23]. Currently supported parallel platforms include distributed memory clusters (using MPI), multi-core CPUs including Intel's Xeon Phi many-core processors (using SIMD, OpenMP, MPI and OpenCL) and GPUs (using CUDA, OpenCL and OpenACC) including clusters of GPUs.…”
Section: Ops (Oxford Parallel Library For Structured Mesh Solvers)mentioning
confidence: 99%
See 3 more Smart Citations