Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations

Kronbichler, Martin; Sashko, Dmytro; Münch, Peter

doi:10.1177/10943420221107880

Cited by 12 publications

(10 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We first assess the efficiency of the global solver, comparing the matrix‐based approach and the matrix‐free approach, see References 49‐52. In the matrix‐based approach, the global system

\left(\mathbf{D}-\mathbf{C}{\mathbf{A}}^{-1}\mathbf{B}\right)\hat{\mathbf{u}}=\mathbf{f}

must be explicitly formed at the macro‐element level, assembled and stored at the global level, and then solved via an iterative solver.…”

Section: Computational Efficiency: Numerical Tests With a Parallel Im...mentioning

confidence: 99%

A matrix‐free macro‐element variant of the hybridized discontinuous Galerkin method

Badrkhani

Hiemstra

Mika

et al. 2023

Numerical Meth Engineering

View full text Add to dashboard Cite

SummaryWe investigate a macro‐element variant of the hybridized discontinuous Galerkin (HDG) method, using patches of standard simplicial elements that can have non‐matching interfaces. Coupled via the HDG technique, our method enables local refinement by uniform simplicial subdivision of each macro‐element. By enforcing one spatial discretization for all macro‐elements, we arrive at local problems per macro‐element that are embarrassingly parallel, yet well balanced. Therefore, our macro‐element variant scales efficiently to n‐node clusters and can be tailored to available hardware by adjusting the local problem size to the capacity of a single node, while still using moderate polynomial orders such as quadratics or cubics. Increasing the local problem size means simultaneously decreasing, in relative terms, the global problem size, hence effectively limiting the proliferation of degrees of freedom. The global problem is solved via a matrix‐free iterative technique that also heavily relies on macro‐element local operations. We investigate and discuss the advantages and limitations of the macro‐element HDG method via an advection‐diffusion model problem.

show abstract

\left(\mathbf{D}-\mathbf{C}{\mathbf{A}}^{-1}\mathbf{B}\right)\hat{\mathbf{u}}=\mathbf{f}

must be explicitly formed at the macro‐element level, assembled and stored at the global level, and then solved via an iterative solver.…”

Section: Computational Efficiency: Numerical Tests With a Parallel Im...mentioning

confidence: 99%

A matrix‐free macro‐element variant of the hybridized discontinuous Galerkin method

Badrkhani

Hiemstra

Mika

et al. 2023

Numerical Meth Engineering

View full text Add to dashboard Cite

show abstract

“…We base our implementation on the deal.II library, 65 taking advantage of the specific structure of the linear problem, and leveraging the properties of modern computer hardware, such as the availability of vectorized SIMD instructions and parallelism—by choosing to implement the preconditioner (and the solver) using the matrix‐free approach, 42 which is very well supported in the deal.II library. The CPU cache efficiency is improved because the data is accessed in a more localized manner, reducing the number of cache misses and increasing the overall performance of the solver 66 . Furthermore, the solver's memory footprint is kept low, which means that it can deal with larger problems or function on computers with limited memory resources.…”

Section: A Multilevel Matrix‐free Preconditioner For the Linear Systemmentioning

confidence: 99%

“…The CPU cache efficiency is improved because the data is accessed in a more localized manner, reducing the number of cache misses and increasing the overall performance of the solver. 66 Furthermore, the solver's memory footprint is kept low, which means that it can deal with larger problems or function on computers with limited memory resources.…”

Section: A Multilevel Matrix-free Preconditioner For the Linear Systemmentioning

confidence: 99%

Exploiting high‐contrast Stokes preconditioners to efficiently solve incompressible fluid–structure interaction problems

Wichrowski,

Krzyżanowski,

Heltai

et al. 2023

Numerical Meth Engineering

View full text Add to dashboard Cite

In this work, we develop a new algorithm to solve large‐scale incompressible time‐dependent fluid–structure interaction problems using a matrix‐free finite element method in arbitrary Lagrangian–Eulerian frame of reference. We derive a semi‐implicit time integration scheme which improves the geometry‐convective explicit scheme for problems involving the interaction between incompressible hyperelastic solids and incompressible fluids. The proposed algorithm relies on the reformulation of the time‐discrete problem as a generalized Stokes problem with strongly variable coefficients, for which optimal preconditioners have recently been developed. The resulting algorithm is scalable, optimal, and robust: we test our implementation on model problems that mimic classical Turek–Hron benchmarks in two and three dimensions, and investigate timing and scalability results.

show abstract

“…Storing and loading the information for each quadrature point gives a memory-bound algorithm (Kronbichler and Kormann, 2019). In our experiments, we use, on affine meshes, compression that is applicable as J q is the same on several quadrature points and compute the metric terms from a triquadratic representation of a deformed cell geometry (Kronbichler et al, 2022).…”

Section: Matrix-free Operator Evaluationmentioning

confidence: 99%

“…In the context of, for example, discontinuous Galerkin methods, it is possible to store only the index of an unknown of a cell, assuming that all DoFs related to a cell are enumerated contiguously. A similar strategy can be also adopted to standard high-order FEM in the case that DoFs are contiguous on each geometric entity, as proposed by Kronbichler et al (2022). Here, one needs to store only the first index of each entity of a cell (3 d ).…”

Section: Restriction and Data Locality In Cell Loopsmentioning

confidence: 99%

Cache-optimized and low-overhead implementations of additive Schwarz methods for high-order FEM multigrid computations

Munch,

Kronbichler

2023

The International Journal of High Performance Computing Applica

Self Cite

View full text Add to dashboard Cite

This contribution presents data-locality optimizations of the additive Schwarz method (ASM) based on the fast-diagonalization method defined on overlapping cell-centric and vertex-star patches in the context of high-order matrix-free finite-element computations on modern CPU-based hardware. The developments are guided by detailed performance models of the ASM in the context of Chebyshev iterations when used as smoothers for p-multigrid. The proposed efficient implementation of ASM adopts concepts known from cell-loop infrastructures for efficient operator evaluation, in particular, the storage of information per geometric entity and the cache-friendly interleaving of cell loops and vector updates as a means to increase data locality. We use the latter concept for both applying the weights required by ASM and performing the vector updates required by the Chebyshev iteration, which are memory-bound operations with non-negligible costs in comparison to efficient operator evaluation. Furthermore, the solution of a scalar Poisson problem on a highly anisotropic and an unstructured mesh with p-multigrid using the developed smoothers indicates that efficient implementations of the additive Schwarz method can outperform optimized point-Jacobi preconditioners already for simple problems despite being more than twice as expensive per iteration. Even though ASM introduces additional communication steps per smoother application, the reduced number of iterations can lead to improved parallel scalability for intermediate problem sizes. At the scaling limit, the results are inconclusive due to these two opposing effects.

show abstract

Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations

Cited by 12 publications

References 49 publications

A matrix‐free macro‐element variant of the hybridized discontinuous Galerkin method

A matrix‐free macro‐element variant of the hybridized discontinuous Galerkin method

Exploiting high‐contrast Stokes preconditioners to efficiently solve incompressible fluid–structure interaction problems

Cache-optimized and low-overhead implementations of additive Schwarz methods for high-order FEM multigrid computations

Contact Info

Product

Resources

About