2017
DOI: 10.1002/cpe.4280
|View full text |Cite
|
Sign up to set email alerts
|

Communication in task‐parallel ILU‐preconditioned CG solvers using MPI + OmpSs

Abstract: We target the parallel solution of sparse linear systems via iterative Krylov subspace-based methods enhanced with ILU-type preconditioners on clusters of multicore processors. In order to tackle largescale problems, we develop task-parallel implementations of the classical iteration for the CG method, accelerated via ILUPACK and ILU(0) preconditioners, using MPI+OmpSs. In addition, we integrate several communication-avoiding (CA) strategies into the codes, including the butterfly communication scheme and Eijk… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…Code availability. Our solvers utilize functionality from the following libraries: ILUPACK (http://ilupack.tu-bs.de, Bollhöfer, 2020), PARDISO (https://www.pardiso-project.org, Davis et al, 2016), and PETSc (https://www.mcs.anl.gov/petsc, Balay et al, 2019a). PETSc is open source under a BSD-2 license; ILUPACK and PARDISO are closed source and offer complementary academic licenses.…”
Section: Discussionmentioning
confidence: 99%
“…Code availability. Our solvers utilize functionality from the following libraries: ILUPACK (http://ilupack.tu-bs.de, Bollhöfer, 2020), PARDISO (https://www.pardiso-project.org, Davis et al, 2016), and PETSc (https://www.mcs.anl.gov/petsc, Balay et al, 2019a). PETSc is open source under a BSD-2 license; ILUPACK and PARDISO are closed source and offer complementary academic licenses.…”
Section: Discussionmentioning
confidence: 99%
“…This level is exploited in each node of the cluster using, for example, OpenMP. The analysis in Aliaga et al (2017); Barreda et al (2019) exposes that, in the PCG, a reasonable option is to leverage task-parallelism , which consists in dividing each kernel into a collection of finer-grain operations, or tasks. Then, each thread executes a different task and two consecutive kernels can be executed concurrently avoiding a thread-synchronization point after each kernel, as described next.…”
Section: Algorithm(s)mentioning
confidence: 99%
“…Our work builds upon a number of previous papers that address the task-parallel implementation of KSMs on multicore architectures and clusters of multicore processors. First, the authors of [1]; proposed a parallel implemen-tation of a CG solver, enhanced with a sophisticated ILUPACK preconditioner, that leverages MPI and OmpSs [13,14] to improve the performance of a pure MPI-based solution; and this approach was then generalized to other types of incomplete LU (ILU)-based preconditioners and communication-reduced variants of CG in [2]. Independently, the authors of [18] presented an iteration-fusing variant of the pipelined CG [9], for multicore processors, that combines a task-parallel re-formulation of the method with a relaxation of the convergence test in order to break the strict barrier between consecutive iterations of the method.…”
Section: Introductionmentioning
confidence: 99%