2022
DOI: 10.1177/10943420221107880
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations

Abstract: This work investigates a variant of the conjugate gradient (CG) method and embeds it into the context of high-order finite-element schemes with fast matrix-free operator evaluation and cheap preconditioners like the matrix diagonal. Relying on a data-dependency analysis and appropriate enumeration of degrees of freedom, we interleave the vector updates and inner products in a CG iteration with the matrix-vector product with only minor organizational overhead. As a result, around 90% of the vector entries of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 49 publications
0
10
0
Order By: Relevance
“…We first assess the efficiency of the global solver, comparing the matrix‐based approach and the matrix‐free approach, see References 49‐52. In the matrix‐based approach, the global system ()boldDprefix−boldCboldAprefix−1boldBtrueboldu^=boldf$$ \left(\mathbf{D}-\mathbf{C}{\mathbf{A}}^{-1}\mathbf{B}\right)\hat{\mathbf{u}}=\mathbf{f} $$ must be explicitly formed at the macro‐element level, assembled and stored at the global level, and then solved via an iterative solver.…”
Section: Computational Efficiency: Numerical Tests With a Parallel Im...mentioning
confidence: 99%
“…We first assess the efficiency of the global solver, comparing the matrix‐based approach and the matrix‐free approach, see References 49‐52. In the matrix‐based approach, the global system ()boldDprefix−boldCboldAprefix−1boldBtrueboldu^=boldf$$ \left(\mathbf{D}-\mathbf{C}{\mathbf{A}}^{-1}\mathbf{B}\right)\hat{\mathbf{u}}=\mathbf{f} $$ must be explicitly formed at the macro‐element level, assembled and stored at the global level, and then solved via an iterative solver.…”
Section: Computational Efficiency: Numerical Tests With a Parallel Im...mentioning
confidence: 99%
“…We base our implementation on the deal.II library, 65 taking advantage of the specific structure of the linear problem, and leveraging the properties of modern computer hardware, such as the availability of vectorized SIMD instructions and parallelism—by choosing to implement the preconditioner (and the solver) using the matrix‐free approach, 42 which is very well supported in the deal.II library. The CPU cache efficiency is improved because the data is accessed in a more localized manner, reducing the number of cache misses and increasing the overall performance of the solver 66 . Furthermore, the solver's memory footprint is kept low, which means that it can deal with larger problems or function on computers with limited memory resources.…”
Section: A Multilevel Matrix‐free Preconditioner For the Linear Systemmentioning
confidence: 99%
“…The CPU cache efficiency is improved because the data is accessed in a more localized manner, reducing the number of cache misses and increasing the overall performance of the solver. 66 Furthermore, the solver's memory footprint is kept low, which means that it can deal with larger problems or function on computers with limited memory resources.…”
Section: A Multilevel Matrix-free Preconditioner For the Linear Systemmentioning
confidence: 99%
“…Storing and loading the information for each quadrature point gives a memory-bound algorithm (Kronbichler and Kormann, 2019). In our experiments, we use, on affine meshes, compression that is applicable as J q is the same on several quadrature points and compute the metric terms from a triquadratic representation of a deformed cell geometry (Kronbichler et al, 2022).…”
Section: Matrix-free Operator Evaluationmentioning
confidence: 99%
“…In the context of, for example, discontinuous Galerkin methods, it is possible to store only the index of an unknown of a cell, assuming that all DoFs related to a cell are enumerated contiguously. A similar strategy can be also adopted to standard high-order FEM in the case that DoFs are contiguous on each geometric entity, as proposed by Kronbichler et al (2022). Here, one needs to store only the first index of each entity of a cell (3 d ).…”
Section: Restriction and Data Locality In Cell Loopsmentioning
confidence: 99%