2022
DOI: 10.2172/1894022
|View full text |Cite
|
Sign up to set email alerts
|

Nek5000/RS Performance on Advanced GPU Architectures

Abstract: 60439. For information about Argonne and its pioneering science and technology programs, see www.anl.gov. DOCUMENT AVAILABILITYOnline Access: U.S. Department of Energy (DOE) reports produced after 1991 and a growing number of pre-1991 documents are available free at OSTI.GOV (http://www.osti.gov/), a service of the US Dept. of Energy's Office of Scientific and Technical Information.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…The Open Concurrent Compute Abstraction provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors [9]. We note that, at 80% parallel efficiency, which corresponds to 2-3 million points per MPI rank, NekRS is running at 0.1 to 0.3 seconds of wall clock time per timestep, which is nearly a 3x improvement over production runs of Nek5000 at a similar 80% strong-scale limit on CPU-based platforms [11].…”
Section: Flow Solver and Numerical Methodsmentioning
confidence: 90%
“…The Open Concurrent Compute Abstraction provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors [9]. We note that, at 80% parallel efficiency, which corresponds to 2-3 million points per MPI rank, NekRS is running at 0.1 to 0.3 seconds of wall clock time per timestep, which is nearly a 3x improvement over production runs of Nek5000 at a similar 80% strong-scale limit on CPU-based platforms [11].…”
Section: Flow Solver and Numerical Methodsmentioning
confidence: 90%
“…The Open Concurrent Compute Abstraction provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors [8]. We note that, at 80% parallel efficiency, which corresponds to 2-3 million points per MPI rank, NekRS is running at 0.1 to 0.3 seconds of wall clock time per timestep, which is nearly a 3x improvement over production runs of Nek5000 at a similar 80% strong-scale limit on CPU-based platforms [9]. Nek5000 was originally developed for simulating turbulent flows with very high fidelity, i.e., DNS and LES.…”
Section: Methodsmentioning
confidence: 90%