Abstract. This paper presents an application of GPU accelerators in Earth system modeling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate-chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC), used to study climate change and air quality scenarios. A source-to-source compiler outputs a CUDA-compatible kernel by parsing the FORTRAN code generated by the Kinetic PreProcessor (KPP) general analysis tool. All Rosenbrock methods that are available in the KPP numerical library are supported.Performance evaluation, using Fermi and Pascal CUDAenabled GPU accelerators, shows achieved speed-ups of 4.5× and 20.4×, respectively, of the kernel execution time. A node-to-node real-world production performance comparison shows a 1.75× speed-up over the non-accelerated application using the KPP three-stage Rosenbrock solver. We provide a detailed description of the code optimizations used to improve the performance including memory optimizations, control code simplification, and reduction of idle time. The accuracy and correctness of the accelerated implementation are evaluated by comparing to the CPU-only code of the application. The median relative difference is found to be less than 0.000000001 % when comparing the output of the accelerated kernel the CPU-only code.The approach followed, including the computational workload division, and the developed GPU solver code can potentially be used as the basis for hardware acceleration of numerous geoscientific models that rely on KPP for atmospheric chemical kinetics applications.
The global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC) is used to study climate change and air quality scenarios. The EMAC model is constituted by a nonlocal dynamical part with low scalability, and local physical/chemical processes with high scalability. The EMAC chemistry-climate model does not benefit from the support of accelerators which are nowadays installed in many HPC systems. We study strategies to offload the calculation of the atmospheric chemistry to accelerator technologies (GPU and Intel MIC), as in typical model configurations this is the most computational resource-demanding subtask. The proposed solutions extend the Kinetic Pre Processor (KPP) general purpose open-source software tool used in atmospheric chemistry.
Abstract-Future multi-core processors will necessitate exploitation of fine-grain, architecture-independent parallelism from applications to utilize many cores with relatively small local memories. We use c264, an end-to-end H.264 video encoder for the Cell processor based on x264, to show that exploiting finegrain parallelism remains challenging and requires significant advancement in runtime support. Our implementation of c264 achieves speedup between 4.7× and 8.6× on six synergistic processing elements (SPEs), compared to the serial version running on the power processing element (PPE). We find that the programming effort associated with efficient parallelization of c264 at fine granularity is highly non-trivial. Hand optimizations may improve performance significantly but are limited eventually by the code restructuring they require. We assess the complexity of exploiting fine-grain parallelism in realistic applications, by identifying optimizations of c264 and the effort they require.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.