The solution of chemical kinetics is one of the most computationally intensive tasks in atmospheric chemical transport simulations. Due to the stiff nature of the system, implicit time stepping algorithms which repeatedly solve linear systems of equations are necessary. This paper reviews the issues and challenges associated with the construction of efficient chemical solvers, discusses several families of algorithms, presents strategies for increasing computational efficiency, and gives insight into implementing chemical solvers on accelerated computer architectures.
This work presents the Kinetics PreProcessor: Accelerated (KPPA), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multicore platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRFChem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on coarse and fine grids. We introduce the multicore architecture parameterization that KPPA uses to generate a chemical kernel for these platforms and describe a code generation system that produces highly tuned platform-specific code. Compared to state-of-the-art serial implementations, speedups exceeding 25Â are regularly observed, with a maximum observed speedup of 41:1Â in single precision.
Event traces are helpful in understanding the performance behavior of message-passing applications since they allow in-depth analyses of communication and synchronization patterns. However, the absence of synchronized hardware clocks may render the analysis ineffective because inaccurate relative event timings can misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors. Although linear offset interpolation can restore consistency to some degree, inaccuracies and time-dependent drifts may still disarrange the original succession of events -especially during longer runs. In our earlier work, we have presented an algorithm that removes the remaining violations of the logical event order postmortem and, in addition, have outlined the initial design of a parallel version. Here, we complete the parallel design and describe its implementation within the SCALASCA trace-analysis framework. We demonstrate its suitability for large-scale applications running on more than thousand application processes and show how the correction can improve the trace analysis of a real-world application example.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.