2020
DOI: 10.48550/arxiv.2003.00119
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

Abstract: Reducing communication -either between levels of a memory hierarchy or between processors over a network -is a key component of performance optimization (in both time and energy) for many problems, including dense linear algebra [BCD + 14], particle interactions [DGK + 13], and machine learning [DD18, GAB + 18]. For these problems, which can be represented as nestedloop computations, previous tiling based approaches [CDK + 13, DR16] have been used to find both lower bounds on the communication required to exec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 5 publications
0
4
0
Order By: Relevance
“…An input program is a collection of statements S enclosed in loop nests, each of the following form (we use the loop nest notation introduced by Dinh and Demmel [23]):…”
Section: Input Programsmentioning
confidence: 99%
See 1 more Smart Citation
“…An input program is a collection of statements S enclosed in loop nests, each of the following form (we use the loop nest notation introduced by Dinh and Demmel [23]):…”
Section: Input Programsmentioning
confidence: 99%
“…Pebbling [13,26,37,45,56] Projection-based [8,15,20,21,23 However, as the local domains become larger and may be more efficiently pipelined and overlapped using asynchronous MPI routines and intra-node OpenMP parallelism, the advantage becomes significant (Figures 8 and 9). COnf LUX outperforms existing libraries up to three times (for P = 4, N = 4096, second-best library is SLATE -Figure 1) and COnf CHOX achieves up to 1.8 times speedup (e.g., P = 4, N = 4,096, second-best is again SLATE).…”
Section: Number Of Nodesmentioning
confidence: 99%
“…We model the program execution as a computational directed acyclic graph (cDAG, details in Section 2. [23]): where (cf. Figure 3 for a summary) for each innermost loop iteration, statement 𝑆 is an evaluation of some function 𝑓 on π‘š inputs, where every input is an element of array 𝐴 𝑗 , 𝑗 = 1, .…”
Section: Input Programsmentioning
confidence: 99%
“…Pebbling [11,25,36,42,52] Projection-based [7,13,18,20,22,49] Problem specific [1,8,15,45 As such, linear solvers are implemented in various libraries for shared-memory environments [3,4,19,30,33,47,59]. For distributed memory, vendor-optimized libraries [14,33] typically implement the ScaLAPACK interface [9], and are based on 2D decomposition, as we empirically verify (Section 8).…”
Section: Related Workmentioning
confidence: 99%