Revisiting Matrix Product on Master-Worker Platforms

Dongarra, Jack; Pineau, Jean-François; Robert, Yves; Shi, Zhiao; Vivien, Frédéric

doi:10.1109/ipdps.2007.370466

Cited by 6 publications

(7 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Adding these two lower bounds gives a lower bound on the number of transfers between slow and fast memory, called the I/O lower bound, of approximately 2mnk/ √ M. Importantly, this lower bound is tight, modulo lower order terms. It improves upon previous work [3,13,15].…”

Section: An I/o Lower Bound For MMMsupporting

confidence: 86%

See 1 more Smart Citation

The MOMMS Family of Matrix Multiplication Algorithms

Smith,

van de Geijn

2019

Preprint

View full text Add to dashboard Cite

As the ratio between the rate of computation and rate with which data can be retrieved from various layers of memory continues to deteriorate, a question arises: Will the current best algorithms for computing matrix-matrix multiplication on future CPUs continue to be (near) optimal?is paper provides compelling analytical and empirical evidence that the answer is "no". e analytical results guide us to a new family of algorithms of which the current state-of-the-art "Goto's algorithm" is but one member. e empirical results, on architectures that were custom built to reduce the amount of bandwidth to main memory, show that under different circumstances, different and particular members of the family become more superior.us, this family will likely start playing a prominent role going forward.

show abstract

Section: An I/o Lower Bound For MMMsupporting

confidence: 86%

“…In [26], it is shown that three algorithms, named Resident A, Resident B, and Resident C, a ain the lower bound on the number of reads from slow memory 3 . Additionally, Resident C a ains the lower bound on the number of writes to slow memory 3 .…”

Section: Resident Algorithms For MMMmentioning

confidence: 99%

The MOMMS Family of Matrix Multiplication Algorithms

Smith,

van de Geijn

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…3.3 on a CDAG G = (V , E), using its compact representation as a DFG. We also present, in 5.1.1, a generalization of one of the techniques introduced in [Dongarra et al 2008;Lowery and Langou 2014;Smith et al 2019; Smith and van de Geijn 2017] that these authors used to derive a tighter lower bound for matrix multiplication.…”

Section: K-partition Bound Derivationmentioning

confidence: 99%

Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

Olivry¹,

Langou²,

Pouchet³

et al. 2019

Preprint

View full text Add to dashboard Cite

For most relevant computation, the energy and time needed for data movement dominates that for performing arithmetic operations on all computing systems today. Hence it is of critical importance to understand the minimal total data movement achievable during the execution of an algorithm. The achieved total data movement for different schedules of an algorithm can vary widely depending on how efficiently the cache is used, e.g., untiled versus effectively tiled matrix-matrix multiplication. A significant current challenge is that no existing tool is able to meaningfully quantify the potential reduction to the data movement of a computation that can be achieved by more effective use of the cache through operation rescheduling. Asymptotic parametric expressions of data movement lower bounds have previously been manually derived for a limited number of algorithms, often without scaling constants. In this paper, we present the first compile-time approach for deriving non-asymptotic parametric expressions of data movement lower bounds for arbitrary affine computations.The approach has been implemented in a fully automatic tool (IOLB) that can generate these lower bounds for input affine programs. IOLB's use is demonstrated by exercising it on all the benchmarks of the PolyBench suite. The advantages of IOLB are many: (1) IOLB enables us to derive bounds for few dozens of algorithms for which these lower bounds have never been derived. This reflects an increase of productivity by automation.(2) Anyone is able to obtain these lower bounds through IOLB, no expertise is required. (3) For some of the most well-studied algorithms, the lower bounds obtained by IOLB are higher than any previously reported manually derived lower bounds.

show abstract

“…Lemma 3.6 for the case where R always equal to M can be found in [12], yielding an I/O lower bound of 3…”

Section: Pmentioning

confidence: 99%

“…Together, Definition 3.2 and Lemma 3.3 represent a simplification of the S-partitioning problem, introduced in [16] and subsequently used in other I/O complexity lower bounds for MMM [12,17]. Segments are very similar to the subcalculations from the S-Span theorem in [16] and phases from [17].…”

mentioning

confidence: 99%

A Tight I/O Lower Bound for Matrix Multiplication

Smith¹,

Lowery²,

Langou³

et al. 2017

Preprint

View full text Add to dashboard Cite

A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform C := AB, for distinct matrices A, B, and C, where each segment is a series of operations involving M reads and writes to and from fast memory, and M is the size of fast memory. A lower bound on the number of segments was then determined by obtaining an upper bound on the number of elementary multiplications performed per segment. is paper follows the same high level approach, but improves the lower bound by (1) transforming algorithms for MMM so that they perform all computation via fused multiply-add instructions (FMAs) and using this to reason about only the cost associated with reading the matrices, and (2) decoupling the per-segment I/O cost from the size of fast memory. For n × n matrices, the lower bound's leading-order term is 2n 3 / √ M. A theoretical algorithm whose leading terms a ains this is introduced. To what extent the state-of-the-art Goto's Algorithm a ains the lower bound is discussed.

show abstract

Revisiting Matrix Product on Master-Worker Platforms

Cited by 6 publications

References 43 publications

The MOMMS Family of Matrix Multiplication Algorithms

The MOMMS Family of Matrix Multiplication Algorithms

Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

A Tight I/O Lower Bound for Matrix Multiplication

Contact Info

Product

Resources

About