This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory parallelization are detailed. Computational results are presented for dual-socket Intel Haswell CPUs at 28 cores, a 64-core Intel Knights Landing, and a 16-core IBM Power8 processor. Up to polynomial degree six, Knights Landing is approximately twice as fast as Haswell. Power8 performs similarly to Haswell, trading a higher frequency for narrower SIMD units. The performance comparison shows that simple ways to express parallelism through for loops perform better on medium and high core counts than a more elaborate task-based parallelization with dynamic scheduling according to dependency graphs, despite less memory transfer in the latter algorithm.
Abstract. In the present contribution, an overview of the sampling based XSUSA method for sensitivity and uncertainty analysis with respect to nuclear data is given. The focus is on recent developments and applications of XSUSA. These applications include calculations for critical assemblies, fuel assembly depletion calculations, and steadystate as well as transient reactor core calculations. The analyses are partially performed in the framework of international benchmark working groups (UACSA -Uncertainty Analyses for Criticality Safety Assessment, UAM -Uncertainty Analysis in Modelling). It is demonstrated that particularly for full-scale reactor calculations the influence of the nuclear data uncertainties on the results can be substantial. For instance, for the radial fission rate distributions of mixed UO 2 /MOX light water reactor cores, the 2σ uncertainties in the core centre and periphery can reach values exceeding 10%. For a fast transient, the resulting time behaviour of the reactor power was covered by a wide uncertainty band. Overall, the results confirm the necessity of adding systematic uncertainty analyses to best-estimate reactor calculations.
Despite the enormous increase in computational power in the last decades, the numerical study of complex flows remains challenging. Stateof-the-art techniques to simulate hyperbolic flows with discontinuities rely on computationally demanding nonlinear schemes, such as Riemann solvers with weighted essentially non-oscillatory (WENO) stencils and characteristic decompositioning. To handle this complexity the numerical load can be reduced via a multiresolution (MR) algorithm with local time stepping (LTS) running on modern high-performance computing (HPC) systems. Eventually, the main challenge lies in an efficitent utilization of the available HPC hardware. In this work, we evaluate the performance improvement for a Message Passing Interface (MPI)parallelized MR solver using single instruction multiple data (SIMD) optimizations. We present straight-forward code modifications that allow for auto-vectorization by the compiler, while maintaining the modularity of the code at comparable performance. We demonstrate performance improvements for representative Euler flow examples on both Intel Haswell and Intel Knights Landing Xeon Phi microarchitecture (KNL) clusters.The tests show single-core speedups of 1.7 (1.9) and average speedups of 1.4 (1.6) for the Haswell (KNL).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.