Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer

Marker, Bryan; Poulson, Jack; Batory, Don; Geijn, Robert A.

doi:10.1007/978-3-642-38718-0_34

Cited by 21 publications

(43 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The FLAME interfaces for indexing are also included to omit indexing in favor of reasoning about matrix partitions. The benefit of these interfaces is that parallelizing most sequential DLA algorithms in high-performance Elemental code is rote (this is described and automated in [6,7]). An expert needs to decide which distributions are efficient and how to redistribute between them.…”

Section: Experiments With Interfacesmentioning

confidence: 99%

“…With Elemental experts predict runtime to choose which parallelization schemes to use or optimizations to apply. Estimates are first-order approximations in terms of the amount of computation performed and the amount of data communicated between processes [6,7]. Thanks to the interfaces in Elemental, BLIS, and libflame, relatively rough cost estimates are good enough to guide experts without having to implement, compile, run, and time code.…”

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

“…In the case of DLA libraries, much of an experts' development work is rote thanks to good abstraction, and we can indeed automate it. In this section, we present the basics of DxT [6,7], which is used to encode expert knowledge about DLA interfaces. A system can then utilize that knowledge to generate high performance code.…”

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

“…To explore the idea of automated program generation for DLA, we have a prototype system called DxTer [6]. DxTer takes an input graph and transformations and outputs a high performance implementation of the input graph.…”

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

See 3 more Smart Citations

Interfaces are key

Marker

Geijn

Batory

2013

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science A

Self Cite

View full text Add to dashboard Cite

Section: Experiments With Interfacesmentioning

confidence: 99%

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

Section: Encoding Expert Knowledge For Automatic Code Generationmentioning

confidence: 99%

See 2 more Smart Citations

Interfaces are key

Marker

Geijn

Batory

2013

Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science A

Self Cite

View full text Add to dashboard Cite

“…Another different work seeks to update BLAS by extending it with additional functionalities [26]. Build-to-order BLAS [27] and Design-by-transformation BLAS [28] approach the problem from a different angle. Their goal is to generate optimized and tuned BLAS-like functions from high level kernel specifications.…”

Section: Related Workmentioning

confidence: 99%

A Code Generation Framework for Targeting Optimized Library Calls for Multiple Platforms

Tan

Tang

Goh

et al. 2015

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Directive-based programming approaches such as OpenMP and OpenACC have gained popularity due to their ease of programming. These programming models typically involve adding compiler directives to code sections such as loops in order to parallelize them for execution on multicore CPUs or GPUs. However, one problem with this approach is that existing compilers generate code directly from the annotated sections and do not make use of hardware-specific architectural features. As a result, the generated code is unable to fully exploit the capabilities of the underlying hardware. Alternatively, we propose a code generation framework in which linear algebraic operations in the annotated codes are recognized, extracted and mapped to optimized vendor-provided platform-specific library calls. We demonstrate that such an approach can result in better performance in the generated code compared to those which are generated by existing compilers. This is substantiated by experimental results on multicore CPUs and GPUs.

show abstract