Abstract. A high-resolution (1/20∘) global ocean general circulation model
with graphics processing unit (GPU) code implementations is developed based on
the LASG/IAP Climate System Ocean Model version 3 (LICOM3) under a
heterogeneous-compute interface for portability (HIP) framework. The dynamic
core and physics package of LICOM3 are both ported to the GPU, and
three-dimensional parallelization (also partitioned in the vertical direction) is
applied. The HIP version of LICOM3 (LICOM3-HIP) is 42 times faster than the
same number of CPU cores when 384 AMD GPUs and CPU cores are used. LICOM3-HIP
has excellent scalability; it can still obtain a speedup of more than 4 on
9216 GPUs compared to 384 GPUs. In this phase, we successfully
performed a test of 1/20∘ LICOM3-HIP using 6550 nodes and
26 200 GPUs, and on a large scale, the model's speed was increased to
approximately 2.72 simulated years per day (SYPD). By putting almost all the
computation processes inside GPUs, the time cost of data transfer between CPUs
and GPUs was reduced, resulting in high performance. Simultaneously, a 14-year
spin-up integration following phase 2 of the Ocean Model Intercomparison
Project (OMIP-2) protocol of surface forcing was performed, and preliminary
results were evaluated. We found that the model results had little difference
from the CPU version. Further comparison with observations and
lower-resolution LICOM3 results suggests that the 1/20∘ LICOM3-HIP
can reproduce the observations and produce many smaller-scale activities, such
as submesoscale eddies and frontal-scale structures.
Abstract. A high-resolution (1/20°) global ocean general circulation model with Graphics processing units (GPUs) code implementations is developed based on the LASG/IAP Climate system Ocean Model version 3 (LICOM3) under Heterogeneous-compute Interface for Portability (HIP) framework. The dynamic core and physics package of LICOM3 are both ported to the GPU, and 3-dimensional parallelization is applied. The HIP version of the LICOM3 (LICOM3-HIP) is 42 times faster than what the same number of CPU cores dose, when 384 AMD GPUs and CPU cores are used. The LICOM3-HIP has excellent scalability; it can still obtain speedup of more than four on 9216 GPUs comparing to 384 GPUs. In this phase, we successfully performed a test of 1/20° LICOM3-HIP using 6550 nodes and 26200 GPUs, and at the grand scale, the model’s time to solution can still obtain an increasing, about 2.72 simulated years per day (SYPD). The high performance was due to putting almost all of computation processes inside GPUs, and thus greatly reduces the time cost of data transfer between CPUs and GPUs. At the same time, a 14-year spin-up integration following the phase 2 of Ocean Model Intercomparison Project (OMIP-2) protocol of surface forcing has been conducted, and the preliminary results have been evaluated. We found that the model results have little differences from the CPU version. Further comparison with observations and lower-resolution LICOM3 results suggests that the 1/20° LICOM3-HIP can not only reproduce the observations, but also produce much smaller scale activities, such as submesoscale eddies and frontal scales structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.