2020
DOI: 10.1016/j.parco.2019.102599
|View full text |Cite
|
Sign up to set email alerts
|

AMG based on compatible weighted matching for GPUs

Abstract: We describe main issues and design principles of an efficient implementation, tailored to recent generations of Nvidia Graphics Processing Units (GPUs), of an Algebraic MultiGrid (AMG) preconditioner previously proposed by one of the authors and already available in the open-source package BootCMatch: Bootstrap algebraic multigrid based on Compatible weighted Matching for standard CPU. The AMG method relies on a new approach for coarsening sparse symmetric positive definite (s.p.d.) matrices, named coarsening … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

4
4

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…The method, named coarsening based on compatible weighted matching was first introduced in [22] and is already available in the sequential package described in [21]. A first parallel version of the method, exploiting fine-grained parallelism and specifically tailored for single GPU device is described in [7,8]. The method is independent of any heuristics or a priori information on the near kernel of A, i.e., the lower part of the range of eigenvalues of the system matrix A which is generally used to obtain good-quality aggregates, and it is a completely automatic procedure applicable to general s.p.d.…”
Section: Parallel Aggregation Based On Weighted Graph Matchingmentioning
confidence: 99%
“…The method, named coarsening based on compatible weighted matching was first introduced in [22] and is already available in the sequential package described in [21]. A first parallel version of the method, exploiting fine-grained parallelism and specifically tailored for single GPU device is described in [7,8]. The method is independent of any heuristics or a priori information on the near kernel of A, i.e., the lower part of the range of eigenvalues of the system matrix A which is generally used to obtain good-quality aggregates, and it is a completely automatic procedure applicable to general s.p.d.…”
Section: Parallel Aggregation Based On Weighted Graph Matchingmentioning
confidence: 99%
“…The new power-to-solution metrics requires a rethinking of many computational kernels of HPC applications looking for a trade-off between the reduction of the total energy and the minimization of the time-to-solution, promoting scalability. Within this context, extensions and improvements of high-performance algorithms and SW libraries for kernels in numerical linear algebra [44], [45] and graph computation, such as iterative [46], [47], [48], [49] and direct linear solvers, edge weighted graph matching, and fast multipole methods [50] will be deployed.…”
Section: Mathlibmentioning
confidence: 99%
“…Its parallelization does not present particular difficulties as mainly relies on standard sparse linear algebra operations such as sparse matrix by matrix and matrix by vector products or other basic tasks which are nowadays readily handled by the highly optimized Basic Linear Algebra Operations (BLAS) routines [5]. The only two tasks that present some difficulties from the parallelization viewpoint are the smoother set-up and application [2] and the coarsening stage [4,3], which have been deeply investigated by several authors in recent years.…”
Section: Special Features Available In Chronos To Increase Performancementioning
confidence: 99%