The global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC) is a modular global model that simulates climate change and air quality scenarios. The application includes different sub-models for the calculation of chemical species concentrations, their interaction with land and sea, and the human interaction. The paper presents a source-to-source parser that enables support for Graphics Processing Units (GPU) by the Kinetic Pre-Processor (KPP) general purpose open-source software tool. The requirements of the host system are also described. The source code of the source-to-source parser is available under the MIT License [1].Keywords: GPU; CUDA; Chemical Kinetics; Climate modeling; Atmospheric Chemistry Funding Statement: The research leading to these results has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 675121 and grant agreement No 676629. This work was also supported by the Cy-Tera Project, which is co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research Promotion Foundation.Alvanos and Theodoros: MEDINA Art. 13, p. 2 of 4 up to 2 KB when indirect accesses are used. All the methods that are available in the KPP numerical library under MECCA are supported.The computation data structures are subdivided in runtime-specified arrays of columns in the atmosphere, with the memory of each array transferred to the GPU global memory and each grid box calculated on a separate GPU core to achieve massive parallelization, as shown in Figure 1. The CUDA chemical kinetics solver comprises three steps, also presented diagrammatically as a flow chart in Figure 2: 1. The first step is the calculation of the reaction rate coefficients. The variable values are stored in a global array inside the GPU and used in the computational kernels. 2. The second step is the most computationally demanding, including mostly linear algebra functions for the ODE solvers. The kernel selects the variation of the Rosenbrock solver method inside the GPU using an array of constant values in the memory. 3. The third step kernel is used for statistical reduction, and demands limited computational time compared with other kernels.There are two files required to enable the GPU utilization: i) f2c_alpha.py and ii) kpp_integrate_cuda_ prototype.cu. The pre-processor is executed by running python f2c_alpha.py in the messy/util directory. When offloading to GPUs, the number of cells must not exceed 12288. The application calculates the number of cells by multiplying the number of columns by the number of levels for the atmosphere. The user can specify the number of columns by using the NVL[1] (NPROMA) runtime parameter in the configuration of the EMAC.
Quality controlTo ensure the quality of the code, we conduct unit testing by comparing the GPU accelerated with a pure Fortran simulation for one model year, using 155 species and 310 reactions. We compare the output of chemical element concentrations between the CPU only and accelerated v...