2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum 2011
DOI: 10.1109/ipdps.2011.299
|View full text |Cite
|
Sign up to set email alerts
|

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA

Abstract: We present a method for developing dense linear algebra algorithms that seamlessly scales to thousands of cores. It can be done with our project called DPLASMA (Distributed PLASMA) that uses a novel generic distributed Direct Acyclic Graph Engine (DAGuE). The engine has been designed for high performance computing and thus it enables scaling of tile algorithms, originating in PLASMA, on large distributed memory systems. The underlying DAGuE framework has many appealing features when considering distributed-mem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
114
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 132 publications
(120 citation statements)
references
References 21 publications
2
114
0
Order By: Relevance
“…For every parameter that appears in the execution space of a Relation's destination, we solve the equality constraints in the conjunction of constraints for this parameter. Consider, as an example, the Relation: This way, when the run-time is processing task T b (7,8) for example, it can compute in O(1) time that it needs to send tile A [8][8] to task T a (8). Also, when processing task T b (7,11), the run-time can compute that A[11] [11] should not be sent to any instance of T a , since the condition (1 + k) == m is not true (clearly, 1 + 7 = 11).…”
Section: Interprocess Data Exchangementioning
confidence: 99%
See 1 more Smart Citation
“…For every parameter that appears in the execution space of a Relation's destination, we solve the equality constraints in the conjunction of constraints for this parameter. Consider, as an example, the Relation: This way, when the run-time is processing task T b (7,8) for example, it can compute in O(1) time that it needs to send tile A [8][8] to task T a (8). Also, when processing task T b (7,11), the run-time can compute that A[11] [11] should not be sent to any instance of T a , since the condition (1 + k) == m is not true (clearly, 1 + 7 = 11).…”
Section: Interprocess Data Exchangementioning
confidence: 99%
“…be the parameters of the task that correspond to N0 /* Initiate Cycle(N0) with an empty (tautologic) */ /* Relation to self. The performance of the DAGuE run-time has been extensively studied in related publications [8,9,7]. The goal of this paper is to present the compiler front-end of the system, so we present only a summary of performance results to demonstrate that our toolchain can automatically analyze, schedule and execute non-trivial algorithms, and deliver high performance at scale.…”
Section: Function F Inalizeantidependencies(ig)mentioning
confidence: 99%
“…[4] This architecture using many copies of the same core due to this improved total computational facility on single chip.Multi-core processors have enhanced performance and area characteristics than difficult single-core processors. [5] They propose and evaluate single-ISA heterogeneous multicore architectures as a system to reduce processor power dissipation.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-core processors have improved performance and area characteristics than complex single-core processors. [5] Assess single-ISA heterogeneous multi-core architectures as a method to decrease processor power dissipation. www.ijacsa.thesai.org…”
Section: Related Workmentioning
confidence: 99%
“…Two notable projects in this category are Charm++ [36] and PaRSEC [9], which deal with algorithms and their implementation represented as a Direct Acyclic Graph (DAG) of tasks connected with edges that communicate data between them -a concept clearly related to the dataflow paradigm. Many other systems offer similar paradigm but might not afford the same type of support for distributed memory parallelism [5,47].…”
Section: Introductionmentioning
confidence: 99%