A high‐productivity task‐based programming model for clusters

Tejedor, Enric; Farreras, Montse; Grove, David; Badía, Rosa M.; Almasi, Gheorghe; Labarta, Jesús

doi:10.1002/cpe.2831

Cited by 23 publications

(29 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the literature, different approaches have been proposed. ClusterSs [13] has a master-slave model where a master process is responsible for maintaining this global view and delegates actual numerical work to other processes. The authors showed that the extreme centralization of this model prevented it from achieving high performance at scale.…”

Section: Algorithm 2 Trsm Kernel Of the Ptg Tile Choleskymentioning

confidence: 99%

“…StarSs is a suite of runtime systems developed at the Barcelona Supercomputing Center and supporting the STF model. Among them, ClusterSs [13] provides an STF support for parallel distributed memory machines with a master-slave interaction scheme. OmpSs [14] targets SMP, SMP-NUMA, GPU and cluster platforms.…”

Section: Short Review Of Task-based Runtime Systems For Distributed Mmentioning

confidence: 99%

“…Many studies have indeed shown that task-based numerical algorithms could compete against or even surpass state-ofthe-art highly optimized low-level peers in areas as diverse as dense linear algebra [1]- [4], sparse linear algebra [5]- [7], fast multipole methods [8], [9], H-matrix computation [10] or stencil computation [11], [12], to name a few. Moreover, various task-based runtime systems making use of this paradigm ( [3], [13]- [17] to cite a few) have reached a high level of robustness, incurring very limited management overhead while enabling a high level of expressiveness as further discussed in Section 2.2.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Agullo¹,

Aumage²,

Faverge³

et al. 2024

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

The emergence of accelerators as standard computing resources on supercomputers and the subsequent architectural complexity increase revived the need for high-level parallel programming paradigms. Sequential task-based programming model has been shown to efficiently meet this challenge on a single multicore node possibly enhanced with accelerators, which motivated its support in the OpenMP 4.0 standard. In this paper, we show that this paradigm can also be employed to achieve high performance on modern supercomputers composed of multiple such nodes, with extremely limited changes in the user code. To prove this claim, we have extended the StarPU runtime system with an advanced inter-node data management layer that supports this model by posting communications automatically. We illustrate our discussion with the task-based tile Cholesky algorithm that we implemented on top of this new runtime system layer. We show that it enables very high productivity while achieving a performance competitive with both the pure Message Passing Interface (MPI)-based ScaLAPACK Cholesky reference implementation and the DPLASMA Cholesky code, which implements another (non-sequential) task-based programming paradigm.

show abstract

Section: Algorithm 2 Trsm Kernel Of the Ptg Tile Choleskymentioning

confidence: 99%

Section: Short Review Of Task-based Runtime Systems For Distributed Mmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Agullo¹,

Aumage²,

Faverge³

et al. 2024

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

show abstract

“…In the context of SDF, several frameworks such as KAAPI [19] or StarSS [32,36] exist, which ease the programming of multi-threaded implementations. Both provide libraries for the concurrent execution of nodes and the communication, which are based on a task model.…”

Section: Related Workmentioning

confidence: 99%

Out-Of-order execution of synchronous data-flow networks

Baudisch

Brandt

Schneider

2012

2012 International Conference on Embedded Computer Systems (SAMOS)

View full text Add to dashboard Cite

Data flow process networks (DPNs) have been introduced as a convenient model of computation for distributed and asynchronous systems since each process node can work independently of the other nodes, i. e. without the need of a global coordination. Synchronous and cyclo-static data flow process networks even allow to derive at compile-time efficient static schedules that allow one to run these systems with an efficient use of available resources, e. g. in embedded systems. Single process nodes of DPNs are stream-based computing devices that transform input streams to uniquely defined corresponding output streams such that single values of the output streams are computed as soon as sufficient input values are available. In this sense, they are related to the execution of an instruction stream by a conventional microprocessor. In this paper, we show how out-of-order execution that has been introduced for the efficient use of multiple functional units in microprocessors can also be used for the implementation of DPNs on multiprocessors. This way, the implementation of DPNs on multiprocessors allows one to optimize the throughput of single process nodes, and as shown by our experiments, also of the entire DPN.

show abstract

“…An extended description of the ClusterSs design, programming model, implementation and productivity study can be found in [7].…”

Section: Introductionmentioning

confidence: 99%

ClusterSs

Tejedor

Farreras

Grove

et al. 2011

Proceedings of the 20th International Symposium on High Performance Distributed Computing

Self Cite

View full text Add to dashboard Cite

Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming while not hindering application performance. StarSs is a family of parallel programming models based on automatic function level parallelism that targets productivity. StarSs deploys a data-flow model: it analyses dependencies between tasks and manages their execution, exploiting their concurrency as much as possible.We introduce Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs. ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one-sided communication.This short paper gives an overview of the ClusterSs design on top of APGAS, as well as the conclusions of a productivity study; in this study, ClusterSs was compared to the IBM X10 language, both in terms of programmability and performance. A technical report is available with the details.

show abstract

A high‐productivity task‐based programming model for clusters

Cited by 23 publications

References 20 publications

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model

Out-Of-order execution of synchronous data-flow networks

ClusterSs

Contact Info

Product

Resources

About