Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis 2019
DOI: 10.1145/3295500.3356181
|View full text |Cite
|
Sign up to set email alerts
|

Red-blue pebbling revisited

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 60 publications
(11 citation statements)
references
References 38 publications
0
11
0
Order By: Relevance
“…Communication-efficient algorithms have been developed for many concrete computational problems and models, including for example matrix multiplication [17,18], FFT [17,1], sorting [1], directed shortest paths [23], topological sorting [2], matrix transposition [1], the N -Body problem [14], QR-and LU-factorization [13], prime tables [5], and Cholesky decomposition [4]. In the blocked-I/O-model [1], using time-forward processing one can compute functions that have a given computation DAG G = (V, E) with O(Sort(|E|)) I/Os [10].…”
Section: Communication-efficient Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…Communication-efficient algorithms have been developed for many concrete computational problems and models, including for example matrix multiplication [17,18], FFT [17,1], sorting [1], directed shortest paths [23], topological sorting [2], matrix transposition [1], the N -Body problem [14], QR-and LU-factorization [13], prime tables [5], and Cholesky decomposition [4]. In the blocked-I/O-model [1], using time-forward processing one can compute functions that have a given computation DAG G = (V, E) with O(Sort(|E|)) I/Os [10].…”
Section: Communication-efficient Algorithmsmentioning
confidence: 99%
“…The red-blue pebble game allows to analyze and optimize the I/Os of general computations. For example, it has been used to optimize the I/Os of classical matrix multiplication [17,18], which can be considered very opposite to the computations of this paper as it allows extensive data reuse.…”
Section: Communication-efficient Algorithmsmentioning
confidence: 99%
“…Computing an I/O complexity upper bound for an algorithm is the most reasonable way to assess the tightness of a lower bound. While this computation is usually done by hand using ad hoc techniques specific to each studied algorithm [1,12,23,28,31,36], Fauzia et al [15] proposed a heuristic that directly reasons on the CDAG, which unfortunately does not scale to real programs. Finding an upper bound for a fixed architecture can also be viewed as finding an optimized program transformation that minimizes data movement costs, which also implies being able to evaluate this cost.…”
Section: Related Workmentioning
confidence: 99%
“…This fact indicates that, sequentially executing the dataflow and assigning most of the effective on-chip memory to the outputs can reach the minimum off-chip memory access. Otherwise, if we perform the dataflow in parallel, The equation (21) means that fully utilizing the on-chip memory owned by each processor to produce the partial sum could maximize the output data reuse and reduce the data transmission in the memory hierarchy.…”
Section: Dataflowmentioning
confidence: 99%
“…After Hong & Kung established the I/O complexity theory [17], Savage developed the notion of S-span to derive Hong-Kung style lower bounds [23]. Kwasniewski et al provided a new proof of I/O complexity of matrix-matrix multiplication and designed a parallel algorithm to reach its lower bound [21]. Although the red-blue pebble game model has been proposed for many years [1-3, 12, 24, 28], it is still difficult to use this model to establish I/O lower bounds of composite algorithms which involve several different kinds of computational patterns [13].…”
Section: Related Workmentioning
confidence: 99%