“…The Open Concurrent Compute Abstraction provides backends for CUDA, HIP, OpenCL, and DPC++, for performance portability across all the major GPU vendors [8]. We note that, at 80% parallel efficiency, which corresponds to 2-3 million points per MPI rank, NekRS is running at 0.1 to 0.3 seconds of wall clock time per timestep, which is nearly a 3x improvement over production runs of Nek5000 at a similar 80% strong-scale limit on CPU-based platforms [9]. Nek5000 was originally developed for simulating turbulent flows with very high fidelity, i.e., DNS and LES.…”