a b s t r a c tModern hardware architectures such as GPUs and manycore processors are characterised by an abundance of compute capability relative to memory bandwidth. This makes them well-suited to solving temporally explicit and spatially compact discretisations of hyperbolic conservation laws. However, classical pressure-projection-based incompressible Navier-Stokes formulations do not fall into this category. One attractive formulation for solving incompressible problems on modern hardware is the method of artificial compressibility. When combined with explicit dual time stepping and a high-order Flux Reconstruction discretisation, the majority of operations can be cast as compute bound matrix-matrix multiplications that are well-suited for GPU acceleration and manycore processing. In this work, we develop a high-order cross-platform incompressible Navier-Stokes solver, via artificial compressibility and dual time stepping, in the PyFR framework. The solver runs on a range of computer architectures, from laptops to the largest supercomputers, via a platform-unified templating approach that can generate/compile CUDA, OpenCL and C/OpenMP code at runtime. The extensibility of the cross-platform templating framework defined within PyFR is clearly demonstrated, as is the utility of P-multigrid for convergence acceleration. The platform independence of the solver is verified on Nvidia Tesla P100 GPUs and Intel Xeon Phi 7210 KNL manycore processors with a 3D Taylor-Green vortex test case. Additionally, the solver is applied to a 3D turbulent jet test case at Re = 10,000, and strong scaling is reported up to 144 GPUs. The new software constitutes the first high-order accurate cross-platform implementation of an incompressible Navier-Stokes solver via artificial compressibility and P-multigrid accelerated dual time stepping to be published in the literature. The technology has applications in a range of sectors, including the maritime and automotive industries. Moreover, due to its cross-platform nature, the technology is well placed to remain relevant in an era of rapidly evolving hardware architectures. Solution method: Artificial compressibility formulation discretised with a high-order Flux Reconstruction approach in space and P-multigrid accelerated dual time stepping in time.
Program summary