2019
DOI: 10.1016/j.jocs.2019.02.004
|View full text |Cite|
|
Sign up to set email alerts
|

Exploiting nested task-parallelism in the H-LU factorization

Abstract: We address the parallelization of the LU factorization of hierarchical matrices (H-matrices) arising from boundary element methods. Our approach exploits task-parallelism via the OmpSs programming model and runtime, which discovers the data-flow parallelism intrinsic to the operation at execution time, via the analysis of data dependencies based on the memory addresses of the tasks' operands. This is especially challenging for H-matrices, as the structures containing the data vary in dimension during the execu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…In particular, Gillman et al [19] propose an inversion scheme based on the Sherman-Morrison-Woodbury formula. However, to our knowledge, the algorithms presented here are the first of the literature to exploit an LU factorization for BLR 2 matrices, similarly to LU-based algorithms for other matrix formats, such as BLR (Algorithm 2.1, [5]), H [14,20], HSS [15], and H 2 [26] At each step k, we first compute the LU factorization of the diagonal block A kk = L kk U kk (line 8). Then, we perform the triangular solves…”
Section: Lu Factorization and Solution Algorithmsmentioning
confidence: 99%
“…In particular, Gillman et al [19] propose an inversion scheme based on the Sherman-Morrison-Woodbury formula. However, to our knowledge, the algorithms presented here are the first of the literature to exploit an LU factorization for BLR 2 matrices, similarly to LU-based algorithms for other matrix formats, such as BLR (Algorithm 2.1, [5]), H [14,20], HSS [15], and H 2 [26] At each step k, we first compute the LU factorization of the diagonal block A kk = L kk U kk (line 8). Then, we perform the triangular solves…”
Section: Lu Factorization and Solution Algorithmsmentioning
confidence: 99%
“…A weak dependency from a parent task to a sub-task does not require the parent task to wait for the sub-task to nish as would be needed in OmpSs (or OpenMP) thereby avoiding unnecessary task synchronisation. is extension would permit the implemention of nested functions with ne grained dependencies as the Hmatrix arithmetic makes use of and was used in [10] to fully implement task-based H-matrix arithmetic. e presented numerical results demonstrate that the technique has some potential but needs further optimizations to be e cient for a wide range of H-matrix structures.…”
Section: Introductionmentioning
confidence: 99%
“…For example, when designing linear algebra solvers based on low-rank approximation algorithms, it is almost impossible to predict the right DAG to ensure good numerical accuracy. [8][9][10][11][12] In general, most modern task-based runtime systems suffer from a lack of dynamism in task-graph generation. Some programming models, such as, 7,13,14 support just-in-time DAG submission, but the generation either follows static rules or requires a significant amount of programming effort.…”
mentioning
confidence: 99%