Pipelined Krylov subspace methods typically offer improved strong scaling on parallel HPC hardware compared to standard Krylov subspace methods for large and sparse linear systems. In pipelined methods the traditional synchronization bottleneck is mitigated by overlapping time-consuming global communications with useful computations. However, to achieve this communication hiding strategy, pipelined methods introduce additional recurrence relations for a number of auxiliary variables that are required to update the approximate solution. This paper aims at studying the influence of local rounding errors that are introduced by the additional recurrences in the pipelined Conjugate Gradient method. Specifically, we analyze the impact of local round-off effects on the attainable accuracy of the pipelined CG algorithm and compare to the traditional CG method. Furthermore, we estimate the gap between the true residual and the recursively computed residual used in the algorithm. Based on this estimate we suggest an automated residual replacement strategy to reduce the loss of attainable accuracy on the final iterative solution. The resulting pipelined CG method with residual replacement improves the maximal attainable accuracy of pipelined CG, while maintaining the efficient parallel performance of the pipelined method. This conclusion is substantiated by numerical results for a variety of benchmark problems. date back to the late 1980's [44] and early 1990's [2,9,10,13,19]. The idea of reducing the number of global communication points in Krylov subspace methods on parallel computer architectures was also used in the s-step methods by Chronopoulos et al. [6,7,8] and more recently by Carson et al. in [3,4]. In addition to communication avoiding methods 1 , research on hiding global communication by overlapping communication with computations was performed by a various authors over the last decades, see Demmel et al. [13], De Sturler et al. [11], and Ghysels et al. [21,22]. We refer the reader to the recent work [5], Section 2 and the references therein for more background and a wider historical perspective on the development of early variants of the CG algorithm that contributed to the current algorithmic strive towards parallel efficiency.The pipelined CG (p-CG) method proposed in [22] aims at hiding the global synchronization latency of standard preconditioned CG by removing some of the global synchronization points. Pipelined CG performs only one global reduction per iteration. Furthermore, this global communication phase is overlapped by the sparse matrix-vector product (spmv), which requires only local communication. In this way, idle core time is minimized by performing useful computations simultaneously to the time-consuming global communication phase, cf. [18].The reorganization of the CG algorithm that is performed to achieve the overlap of communication with computations introduces several additional axpy (y ← αx+y) operations to recursively compute auxiliary variables. Vector operations such as an axpy are typi...
Pipelined Krylov solvers typically display better strong scaling compared to standard Krylov methods for large linear systems. The synchronization bottleneck is mitigated by overlapping time-consuming global communications with computations. To achieve this hiding of communication, pipelined methods feature additional recurrence relations on auxiliary variables. This paper analyzes why rounding error effects have a significantly larger impact on the accuracy of pipelined algorithms. An algebraic model for the accumulation of rounding errors in the (pipelined) CG algorithm is derived. Furthermore, an automated residual replacement strategy is proposed to reduce the effect of rounding errors on the final solution. MPI parallel performance tests implemented in PETSc on an Intel Xeon X5660 cluster show that the pipelined CG method with automated residual replacement is more resilient to rounding errors while maintaining the efficient parallel performance obtained by pipelining.
The conjugate gradient (CG) method is the most widely used iterative scheme for the solution of large sparse systems of linear equations when the matrix is symmetric positive definite. Although more than sixty year old, it is still a serious candidate for extreme-scale computation on large computing platforms. On the technological side, the continuous shrinking of transistor geometry and the increasing complexity of these devices affect dramatically their sensitivity to natural radiation, and thus diminish their reliability. One of the most common effects produced by natural radiation is the single event upset which consists in a bit-flip in a memory cell producing unexpected results at application level. Consequently, the future computing facilities at extreme scale might be more prone to errors of any kind including bit-flip during calculation. These numerical and technological observations are the main motivations for this work, where we first investigate through extensive numerical experiments the sensitivity of CG to bit-flips in its main computationally intensive kernels, namely the matrix-vector product and the preconditioner application. We further propose numerical criteria to detect the occurrence of such faults; we assess their robustness through extensive numerical experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.