2016
DOI: 10.1145/2898348
|View full text |Cite
|
Sign up to set email alerts
|

Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems

Abstract: To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper ev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
57
0
1

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 41 publications
(58 citation statements)
references
References 32 publications
0
57
0
1
Order By: Relevance
“…They show the scalability of qrm parsec using both the 1D and 2D front factorization algorithms; the speedups are computed with respect to the sequential running time reported in Table 1. These are compared to the results obtained with an equivalent implementation based on the Sequential Task Flow model and the StarPU runtime system [2]. The results show that qrm parsec achieves a satisfactory performance on all the tested matrices, including the smallest ones (on the left side of the plot) with speedups close to 20 (out of 24) for the largest size ones.…”
Section: Early Experimental Resultsmentioning
confidence: 97%
See 2 more Smart Citations
“…They show the scalability of qrm parsec using both the 1D and 2D front factorization algorithms; the speedups are computed with respect to the sequential running time reported in Table 1. These are compared to the results obtained with an equivalent implementation based on the Sequential Task Flow model and the StarPU runtime system [2]. The results show that qrm parsec achieves a satisfactory performance on all the tested matrices, including the smallest ones (on the left side of the plot) with speedups close to 20 (out of 24) for the largest size ones.…”
Section: Early Experimental Resultsmentioning
confidence: 97%
“…The runs were performed on the Dude system which is a shared-memory machine equipped with four AMD Opteron(tm) Processor 8431 (six cores) and 72 GB of memory. As a reference, we also report on the performance of the STF implementation of the solver from [2], which is supported with StarPU and named qrm starpu below. The experimental results are presented in Figure 4.…”
Section: Early Experimental Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach is complex and usually requires completely rewriting an application. The second method is the sequential task flow (STF) (Agullo et al, 2016b). Here, a single thread creates the tasks by informing the RS about the access of each of them on the data.…”
Section: Task-based Parallelizationmentioning
confidence: 99%
“…The StarPU runtime has been configured with the PRIO scheduler (with a central queue on each node, sorting tasks by priorities given by the developer) and dedicates, on each node, one core for task submission (using the Sequential Task Flow paradigm 45 ) and another core to handle MPI operations. As an introductory illustration, we consider the Chameleon/Cholesky decomposition of an input matrix of dimension 72,000, divided in 75 × 75 tiles of size 960 (ie, with 75 dpotrf tasks), executed on two nodes comprising five CPU and two GPU workers each, and interconnected through a 10 Gb/s Ethernet network.…”
Section: Visualization Panelsmentioning
confidence: 99%