This paper documents development of a multiple-Graphics Processing Unit (GPU) version of FUNWAVE-Total Variation Diminishing (TVD), an open-source model for solving the fully nonlinear Boussinesq wave equations using a high-order TVD solver. The numerical schemes of FUNWAVE-TVD, including Cartesian and spherical coordinates, are rewritten using CUDA Fortran, with inter-GPU communication facilitated by the Message Passing Interface. Since FUNWAVE-TVD involves the discretization of high-order dispersive derivatives, the on-chip shared memory is utilized to reduce global memory access. To further optimize performance, the batched tridiagonal solver is scheduled simultaneously in multiple-GPU streams, which can reduce the GPU execution time by 20-30%. The GPU version is validated through a benchmark test for wave runup on a complex shoreline geometry, as well as a basin-scale tsunami simulation of the 2011 Tohoku-oki event. Efficiency evaluation shows that, in comparison with the CPU version running at a 36-core HPC node, speedup ratios of 4-7 and above 10 can be observed for single-and double-GPU runs, respectively. The performance metrics of multiple-GPU implementation needs to be further evaluated when appropriate.Plain Language Summary Numerical modeling of surface wave dynamics is necessary for coastal infrastructure design. FUNWAVE-Total Variation Diminishing is a widely accepted open-source wave model for simulating surface wave propagation and wave-driven processes in the nearshore region, as well as tsunami wave propagation at oceanic scales. Due to the complexity of governing equations and corresponding numerical methods, the modeling of wave dynamics usually depends on the use of High Performance Clusters, which are both expensive and power consuming. To address this problem, Graphics Processing Unit (GPU)-accelerated computing is introduced in the FUNWAVE-Total Variation Diminishing for wave dynamics modeling in this study. GPUs were originally used for image processing and visualization purpose in personal computers. Because GPUs have thousands of "Cores" that can implement arithmetic computations simultaneously, they are now widely employed to facilitate computing-intensive tasks such as deep learning and engineering computations. We find that by porting wave model to GPU devices, the modeling of surface wave dynamics over a large domain can be achieved by an affordable stand-alone PC with GPU cards installed.
Key Points:• The fully nonlinear Boussinesq wave model FUNWAVE-TVD is ported to multiple-GPU for acceleration • The GPU version is ideal for solving wave problems over large computational domains in a stand-alone machineAs an efficient, portable yet commercially available substitute for HPC clusters, GPUs have played a central role in implementing massive computation across a wide range of areas, among which GPU acceleration of CFD algorithms was one of the main areas in the past few years. The key to the success of GPU computing is partly attributed to its capability of massive computation charac...