Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Steuwer, Michel; Fensch, Christian; Lindley, Sam; Dubach, Christophe

doi:10.1145/2784731.2784754

Cited by 88 publications

(17 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, true performance portability cannot be achieved with these standards, as optimized code/directives vastly differ on each platform (especially in the case of FPGAs). Other frameworks mentioned below [7,18,31,45,58,61,63] also support imperative and massively parallel architectures (CPUs, GPUs), where Halide and Tiramisu have been extended [62] to target FPGA kernels. As opposed to SDFGs, none of the above models were designed to natively support both load/store architectures and reconfigurable hardware.…”

Section: Related Workmentioning

confidence: 99%

“…As the SDFG provides general-purpose state machines with dataflow, all the above models can be fully represented within it, where SDFGs have the added benefit of encapsulating fine-grained data dependencies. [18,44,51,58,60,63] provide a fixed set of high-level program transformations, similar to those presented on SDFGs. In particular, Halide's schedules are by definition data-centric, and the same applies to polyhedral loop transformations in CHiLL.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Stateful dataflow multigraphs

Ben-Nun

Licht

Ziogas

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

The ubiquity of accelerators in high-performance computing has driven programming complexity beyond the skill-set of the average domain scientist. To maintain performance portability in the future, it is imperative to decouple architecture-specific programming paradigms from the underlying scientific computations. We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating program definition from its optimization. By combining fine-grained data dependencies with high-level control-flow, SDFGs are both expressive and amenable to program transformations, such as tiling and double-buffering. These transformations are applied to the SDFG in an interactive process, using extensible pattern matching, graph rewriting, and a graphical user interface. We demonstrate SDFGs on CPUs, GPUs, and FPGAs over various motifs -from fundamental computational kernels to graph analytics. We show that SDFGs deliver competitive performance, allowing domain scientists to develop applications naturally and port them to approach peak hardware performance without modifying the original scientific code.HPC programmers have long sacrificed ease of programming and portability for achieving better performance. This mindset was established at a time when computer nodes had a single processor/core and were programmed with C/Fortran and MPI. The last decade, witnessing the end of Dennard scaling and Moore's law, brought a flurry of new technologies into the compute nodes. Those range from simple multi-core and manycore CPUs to heterogeneous GPUs and specialized FPGAs. To support those architectures, the complexity of OpenMP's specification grew by more than an order of magnitude from 63 pages in OpenMP 1.0 to 666 pages in OpenMP 5.0. This one example illustrates how (performance) programming complexity shifted from network scalability to node

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Stateful dataflow multigraphs

Ben-Nun

Licht

Ziogas

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

show abstract

“…The community will be encouraged to share their implementations of the basic building blocks of CNNs: from high-level, platform-agnostic descriptions (e.g. as functional expressions [18] or programs in PENCIL [9]) to lowlevel, platform-specific kernels (as can be found in vendor- (Only the AlexNet layers that take more than 1 second to execute are shown.) optimized libraries).…”

Section: Open Call For Collaborative Op-timization Of Cnnsmentioning

confidence: 99%

Optimizing convolutional neural networks on embedded platforms with OpenCL

Lokhmotov¹,

Fursin²

2016

Proceedings of the 4th International Workshop on OpenCL

View full text Add to dashboard Cite

show abstract

“…Dedicated FP languages were proposed like NOVA, from NVIDIA [6]. In a separate context rewrite rules were investigated to generate low level representations of high-level parallel constructs [7]. Our approach is different from these works in the following way: first, these languages are inaccessible for the mainstream scientist, who are familiar with C/C++, and where only low level APIs are available.…”

Section: Related Workmentioning

confidence: 99%

C++ EDSL for parallel code generation

Berényi

2015

2015 Conference Grid, Cloud &Amp; High Performance Computing in Science (ROLCG)

View full text Add to dashboard Cite

Code generation is ubiquitous for modern highperformance computing (HPC) to provide efficient but highly parametrizable program development. Many times functional dependencies should be made available for the user to manipulate, and such arbitrary functions should be efficiently parallelized over multiple levels. We propose an embedded domain specific language inside C++ for manipulating abstract syntax trees (ASTs) that can represent arbitrary computation, and that such language can be extended with constructs for parallelism and functional programming.

show abstract

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Cited by 88 publications

References 40 publications

Stateful dataflow multigraphs

Stateful dataflow multigraphs

Optimizing convolutional neural networks on embedded platforms with OpenCL

C++ EDSL for parallel code generation

Contact Info

Product

Resources

About