2017
DOI: 10.1007/978-3-319-58667-0_13
|View full text |Cite
|
Sign up to set email alerts
|

Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures

Abstract: This study compares the performance of high-order discontinuous Galerkin finite elements on modern hardware. The main computational kernel is the matrix-free evaluation of differential operators by sum factorization, exemplified on the symmetric interior penalty discretization of the Laplacian as a metric for a complex application code in fluid dynamics. State-of-the-art implementations of these kernels stress both arithmetics and memory transfer. The implementations of SIMD vectorization and shared-memory par… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
20
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 15 publications
(21 citation statements)
references
References 10 publications
1
20
0
Order By: Relevance
“…However, LDF bases turn out to be potentially more computationally expensive than Legendre bases for high orders: the Legendre basis (13) possesses a tensor product structure (component × basis function) which allows optimized computation of conserved states u and Gaussian quadrature using sum factorization (see Kronbichler et al 2017). These optimizations cannot be readily used with LDF bases, because the coupling between components of B breaks the tensor product structure.…”
Section: Locally Divergence-free Basesmentioning
confidence: 99%
“…However, LDF bases turn out to be potentially more computationally expensive than Legendre bases for high orders: the Legendre basis (13) possesses a tensor product structure (component × basis function) which allows optimized computation of conserved states u and Gaussian quadrature using sum factorization (see Kronbichler et al 2017). These optimizations cannot be readily used with LDF bases, because the coupling between components of B breaks the tensor product structure.…”
Section: Locally Divergence-free Basesmentioning
confidence: 99%
“…A stabilization of the DG discretization for under‐resolved flows based on a consistent divergence penalty term and a consistent continuity penalty term has been developed in the works of Krank et al and Fehn et al, which renders this approach a highly attractive candidate for implicit LES. For example, this approach has been applied to large‐scale computations of turbulent flows such as the direct numerical simulation of periodic hill flow in the work of Krank et al Our implementation is based on high‐performance matrix‐free methods for tensor product finite elements developed recently in the works of Kronbichler and Kormann for both continuous and discontinuous Galerkin methods, exhibiting excellent performance characteristics . Previous work on matrix‐free methods for continuous spectral element methods can be found in the works of Vos et al and Cantwell et al as well as the references therein.…”
Section: Introductionmentioning
confidence: 99%
“…For example, this approach has been applied to large-scale computations of turbulent flows such as the direct numerical simulation of periodic hill flow in the work of Krank et al 8 Our implementation is based on high-performance matrix-free methods for tensor product finite elements developed recently in the works of Kronbichler and Kormann 9,10 for both continuous and discontinuous Galerkin methods, exhibiting excellent performance characteristics. 10,11 Previous work on matrix-free methods for continuous spectral element methods can be found in the works of Vos et al 12 and Cantwell et al 13 as well as the references therein.…”
mentioning
confidence: 99%
“…In the present work, we propose a performance‐optimized DG spectral element approach for the solution of the compressible Navier‐Stokes equations based on a generic matrix‐free implementation for quadrilateral/hexahedral elements with a focus on the solution of under‐resolved turbulent incompressible flows. The matrix‐free implementation used in this work has been shown to exhibit outstanding performance characteristics with the throughput for operator evaluation measured in degrees of freedom processed per second almost independent of the polynomial degree k . High‐performance implementations for high‐order DG discretizations have also been proposed recently in the work of Müthing et al, and matrix‐free implementations for continuous spectral element methods are discussed, for example, in the works of Vos et al, Cantwell et al, May et al, and Kronbichler et al…”
Section: Motivationmentioning
confidence: 99%
“…16,17 In the present work, we propose a performance-optimized DG spectral element approach for the solution of the compressible Navier-Stokes equations based on a generic matrix-free implementation for quadrilateral/hexahedral elements with a focus on the solution of under-resolved turbulent incompressible flows. The matrix-free implementation used in this work has been shown to exhibit outstanding performance characteristics [18][19][20] with the throughput for operator evaluation measured in degrees of freedom processed per second almost independent of the polynomial degree k. High-performance implementations for high-order DG discretizations have also been proposed recently in the work of Müthing et al, 21 and matrix-free implementations for continuous spectral element methods are discussed, for example, in the works of Vos et al, 22 Cantwell et al, 23 May et al, 24 and Kronbichler et al 25 *The term "modern computer hardware" refers to current cache-based, multicore CPU architectures for which the Flop/Byte ratio, a characteristic quantity specifying the hardware properties, exhibits a value significantly larger than 1. 13 It should be kept in mind that the selection of optimal algorithms (providing the best overall efficiency) among different realizations with different memory and compute requirements therefore depends on recent developments in computer hardware.…”
Section: Motivationmentioning
confidence: 99%