2017
DOI: 10.48550/arxiv.1702.02017
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Tight I/O Lower Bound for Matrix Multiplication

Abstract: A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform C := AB, for distinct matrices A, B, and C, where each segment is a series of operations involving M reads and writes to and from fast memory, and M is the size of fast memory. A lower bound on the number of segments was then determined by obtaining an upper bound … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(17 citation statements)
references
References 17 publications
0
17
0
Order By: Relevance
“…Smith et al [Smith et al 2019] introduced a generalization of this argument, leading to tighter bounds in many cases. The idea is to decompose the execution into segments with T loads.…”
Section: Partitioningmentioning
confidence: 99%
See 1 more Smart Citation
“…Smith et al [Smith et al 2019] introduced a generalization of this argument, leading to tighter bounds in many cases. The idea is to decompose the execution into segments with T loads.…”
Section: Partitioningmentioning
confidence: 99%
“…3.3 on a CDAG G = (V , E), using its compact representation as a DFG. We also present, in 5.1.1, a generalization of one of the techniques introduced in [Dongarra et al 2008;Lowery and Langou 2014;Smith et al 2019; Smith and van de Geijn 2017] that these authors used to derive a tighter lower bound for matrix multiplication.…”
Section: K-partition Bound Derivationmentioning
confidence: 99%
“…Smith et al [26] starts with a simple model of memory with two layers of memory: a small, fast memory with capacity of M elements and a large, slow memory with unlimited capacity. It shows that any algorithm for ordinary MMM 2 must read at least 2mnk/ √ M − 2M elements from slow memory and additionally write at least mn −M elements to slow memory.…”
Section: An I/o Lower Bound For MMMmentioning
confidence: 99%
“…In [26], it is shown that three algorithms, named Resident A, Resident B, and Resident C, a ain the lower bound on the number of reads from slow memory 3 . Additionally, Resident C a ains the lower bound on the number of writes to slow memory 3 .…”
Section: Resident Algorithms For MMMmentioning
confidence: 99%
See 1 more Smart Citation