Code Generation and Optimization of Distributed-memory Dense Linear Algebra Kernels

Marker, Bryan; Batory, Don; Geijn, Robert A.

doi:10.1016/j.procs.2013.05.295

Cited by 8 publications

(24 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The FLAME interfaces for indexing are also included to omit indexing in favor of reasoning about matrix partitions. The benefit of these interfaces is that parallelizing most sequential DLA algorithms in high-performance Elemental code is rote (this is described and automated in [6,7]). An expert needs to decide which distributions are efficient and how to redistribute between them.…”

Section: Experiments With Interfacesmentioning

confidence: 99%

“…In the case of DLA libraries, much of an experts' development work is rote thanks to good abstraction, and we can indeed automate it. In this section, we present the basics of DxT [6,7], which is used to encode expert knowledge about DLA interfaces. A system can then utilize that knowledge to generate high performance code.…”

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

“…With Elemental experts predict runtime to choose which parallelization schemes to use or optimizations to apply. Estimates are first-order approximations in terms of the amount of computation performed and the amount of data communicated between processes [6,7]. Thanks to the interfaces in Elemental, BLIS, and libflame, relatively rough cost estimates are good enough to guide experts without having to implement, compile, run, and time code.…”

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

See 2 more Smart Citations

Interfaces are key

Marker

Geijn

Batory

2013

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science A

Self Cite

View full text Add to dashboard Cite

Section: Experiments With Interfacesmentioning

confidence: 99%

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

See 1 more Smart Citation

Interfaces are key

Marker

Geijn

Batory

2013

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science A

Self Cite

View full text Add to dashboard Cite

“…We have automated the exploration of these spaces (by generating all implementations using a methodical process) and we evaluate the efficiency of each implementation via cost estimation. 1 This is how we find the best-performing algorithm that experts would intuitively select [17,18,19]. In all tests, generated code is the same or better than experts' hand-produced implementations.…”

Section: Introductionmentioning

confidence: 99%

“…We begin with a brief overview of how we generate the space of implementations for a given operation in the domain. Our approach is called Design by Transformation (DxT) -more details are given in [9,17,18,19,27]. …”

Section: Introductionmentioning

confidence: 99%

Understanding performance stairs

Marker

Batory

Geijn

2014

Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

Self Cite

View full text Add to dashboard Cite

How do experts navigate the huge space of implementations for a given specification to find an efficient choice with minimal searching? Answer: They use "heuristics" -rules of thumb that are more street wisdom than scientific fact. We provide a scientific justification for Dense Linear Algebra (DLA) heuristics by showing that only a few decisions (out of many possible) are critical to performance; once these decisions are made, the die is cast and only relatively minor performance improvements are possible. The (implementation × performance) space of DLA is stair-stepped. Each stair is a set of implementations with very similar performance and (surprisingly) share key design decision(s). High-performance stairs align with heuristics that prescribe certain decisions in a particular context. Stairs also tell us how to tailor the search engine of a DLA code generator to reduce the time it needs to find implementations that are as good or better than those crafted by experts.

show abstract