Abstract. Precise and rapid air quality simulations and forecasting are
limited by the computational performance of the air quality model used, and
the gas-phase chemistry module is the most time-consuming function in the air
quality model. In this study, we designed a new framework for the widely used
the Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to
adapt the single-instruction, multiple-data (SIMD) technology in next-generation
processors to improve its calculation performance. The
optimization implements the fine-grain level parallelization of CBM-Z by
improving its vectorization ability. Through constructing loops and
integrating the main branches, e.g., diverse chemistry sub-schemes, multiple
spatial points in the model can be operated simultaneously on vector
processing units (VPUs). Two generation CPUs – Intel Xeon E5-2680 V4 CPU and
Intel Xeon Gold 6132 – and Intel Xeon Phi 7250 Knights Landing (KNL) are
used as the benchmark processors. The validation of the CBM-Z module outputs
indicates that the relative bias reaches a maximum of 0.025 % after 10 h
integration with -fp-model fast =1 compile flag. The results of
the module test show that the Multiple-Points CBM-Z (MP CBM-Z) resulted in
5.16× and 8.97× speedup on a single core of Intel Xeon E5-2680
V4 and Intel Xeon Gold 6132 CPUs, respectively, and KNL had a speedup of
3.69× compared with the performance of CBM-Z on the Intel Xeon E5-2680
V4 platform. For the single-node tests, the speedup on the two generation
CPUs can reach 104.63× and 198.50× using message passing
interface (MPI) and 101.02× and 194.60× using OpenMP, and the
speedup on the KNL node can reach 175.23× using MPI and 167.45× using OpenMP. The speedup of
the optimized CBM-Z is approximately 40 % higher on a one-socket KNL
platform than on a two-socket Broadwell platform and about 13 %–16 %
lower than on a two-socket Skylake platform. We also tested a
three-dimensional chemistry transport model (CTM) named Nested Air Quality
Prediction Model System (NAQPMS) equipped with the MP CBM-Z. The tests
illustrate an obvious improvement on the performance for the CTM after
adopting the MP CBM-Z. The results show that the MP CBM-Z leads to a speedup
of 3.32 and 1.96 for the gas-phase chemistry module and the CTM on the Intel
Xeon E5-2680 platform. Moreover, on the new Intel Xeon Gold 6132 platform,
the MP CBM-Z gains 4.90× and 2.22× speedups for the gas-phase
chemistry module and the whole CTM. For the KNL, the MP CBM-Z enables a
3.52× speedup for the gas-phase chemistry module, but the whole model
lost 24.10 % performance compared to the CPU platform due to the poor
performance of other modules. In addition, since this optimization seeks to
improve the utilization of the VPU, the model is more suitable for the new
generation processors adopting the more advanced SIMD technology. The results
of our tests already show that the benefit of updating CPU improved by about
47 % by using the MP CBM-Z since the optimized code has better
adaptability for the new hardware. This work improves the performance of the
CBM-Z chemical kinetics kernel as well as the calculation efficiency of the
air quality model, which can directly improve the practical value of the air
quality model in scientific simulations and routine forecasting.