2020
DOI: 10.1016/j.cam.2019.112697
|View full text |Cite
|
Sign up to set email alerts
|

Reproducibility strategies for parallel Preconditioned Conjugate Gradient

Abstract: The Preconditioned Conjugate Gradient method is often used in numerical simulations. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to its more lightweight performan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

4
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 9 publications
(19 citation statements)
references
References 27 publications
4
15
0
Order By: Relevance
“…The ExBLAS and Opt implementations deliver both accurate and reproducible results that are identical with the MPFR library. Note that these results are identical to the ones from the pure MPI implementations in Iakymchuk et al (2019a) and only the results of the original code differ. The original code shows the difference from one digit on the initial iteration and up to 5 digits on the 45th iteration on 48 cores (8 MPI processes with 6 OpenMP threads per each).…”
Section: Resultssupporting
confidence: 68%
See 4 more Smart Citations
“…The ExBLAS and Opt implementations deliver both accurate and reproducible results that are identical with the MPFR library. Note that these results are identical to the ones from the pure MPI implementations in Iakymchuk et al (2019a) and only the results of the original code differ. The original code shows the difference from one digit on the initial iteration and up to 5 digits on the 45th iteration on 48 cores (8 MPI processes with 6 OpenMP threads per each).…”
Section: Resultssupporting
confidence: 68%
“…Motivated by “100 bits suffice for many HPC applications” as noted by David Bailey at ARITH-21 Bailey (2013) and a mini accumulator from the ARM team Lutz and Hinds (2017); Burgess et al (2019), we derive a faster but less generic version using FPEs, which is the other core algorithmic component in the ExBLAS approach, aiming to adjust the algorithm to the problem at hand. As a consequence, we also address the common issue of sparse iterative solvers—the accuracy while computing the residual—and propose to use solutions that offer reproducibility (and potentially correct-rounding) only while computing the corresponding dot products. Hence, we derive two hybrid (MPI + OpenMP tasks), reproducible, and accurate dot products using ExBLAS and FPEs. Finally, we demonstrate applicability and feasibility of the aforementioned idea with the ExBLAS- and FPE-based approaches in the hybrid MPI + OpenMP implementation of PCG on an example of a 3D Poisson’s equation with 27 stencil points as well as several test matrices from the SuiteSparse matrix collection. This extends our previous results with the pure MPI implementation of PGC Iakymchuk et al (2019a) to the more complex double-level dot products and reductions with dynamic scheduling of the tasks. …”
Section: Introductionsupporting
confidence: 82%
See 3 more Smart Citations