2007 IEEE International Parallel and Distributed Processing Symposium 2007
DOI: 10.1109/ipdps.2007.370466
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Matrix Product on Master-Worker Platforms

Abstract: This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: -Centralized data. We assume that all matrix files originate from, and must be returned to, the master. The master distributes both data and computations to the workers (whi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 43 publications
1
6
0
Order By: Relevance
“…Adding these two lower bounds gives a lower bound on the number of transfers between slow and fast memory, called the I/O lower bound, of approximately 2mnk/ √ M. Importantly, this lower bound is tight, modulo lower order terms. It improves upon previous work [3,13,15].…”
Section: An I/o Lower Bound For MMMsupporting
confidence: 86%
See 1 more Smart Citation
“…Adding these two lower bounds gives a lower bound on the number of transfers between slow and fast memory, called the I/O lower bound, of approximately 2mnk/ √ M. Importantly, this lower bound is tight, modulo lower order terms. It improves upon previous work [3,13,15].…”
Section: An I/o Lower Bound For MMMsupporting
confidence: 86%
“…In [26], it is shown that three algorithms, named Resident A, Resident B, and Resident C, a ain the lower bound on the number of reads from slow memory 3 . Additionally, Resident C a ains the lower bound on the number of writes to slow memory 3 .…”
Section: Resident Algorithms For MMMmentioning
confidence: 99%
“…3.3 on a CDAG G = (V , E), using its compact representation as a DFG. We also present, in 5.1.1, a generalization of one of the techniques introduced in [Dongarra et al 2008;Lowery and Langou 2014;Smith et al 2019; Smith and van de Geijn 2017] that these authors used to derive a tighter lower bound for matrix multiplication.…”
Section: K-partition Bound Derivationmentioning
confidence: 99%
“…Lemma 3.6 for the case where R always equal to M can be found in [12], yielding an I/O lower bound of 3…”
Section: Pmentioning
confidence: 99%
“…Together, Definition 3.2 and Lemma 3.3 represent a simplification of the S-partitioning problem, introduced in [16] and subsequently used in other I/O complexity lower bounds for MMM [12,17]. Segments are very similar to the subcalculations from the S-Span theorem in [16] and phases from [17].…”
mentioning
confidence: 99%