Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming 2015
DOI: 10.1145/2784731.2784754
|View full text |Cite
|
Sign up to set email alerts
|

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

Abstract: Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 88 publications
(17 citation statements)
references
References 40 publications
0
17
0
Order By: Relevance
“…However, true performance portability cannot be achieved with these standards, as optimized code/directives vastly differ on each platform (especially in the case of FPGAs). Other frameworks mentioned below [7,18,31,45,58,61,63] also support imperative and massively parallel architectures (CPUs, GPUs), where Halide and Tiramisu have been extended [62] to target FPGA kernels. As opposed to SDFGs, none of the above models were designed to natively support both load/store architectures and reconfigurable hardware.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, true performance portability cannot be achieved with these standards, as optimized code/directives vastly differ on each platform (especially in the case of FPGAs). Other frameworks mentioned below [7,18,31,45,58,61,63] also support imperative and massively parallel architectures (CPUs, GPUs), where Halide and Tiramisu have been extended [62] to target FPGA kernels. As opposed to SDFGs, none of the above models were designed to natively support both load/store architectures and reconfigurable hardware.…”
Section: Related Workmentioning
confidence: 99%
“…As the SDFG provides general-purpose state machines with dataflow, all the above models can be fully represented within it, where SDFGs have the added benefit of encapsulating fine-grained data dependencies. [18,44,51,58,60,63] provide a fixed set of high-level program transformations, similar to those presented on SDFGs. In particular, Halide's schedules are by definition data-centric, and the same applies to polyhedral loop transformations in CHiLL.…”
Section: Related Workmentioning
confidence: 99%
“…The community will be encouraged to share their implementations of the basic building blocks of CNNs: from high-level, platform-agnostic descriptions (e.g. as functional expressions [18] or programs in PENCIL [9]) to lowlevel, platform-specific kernels (as can be found in vendor- (Only the AlexNet layers that take more than 1 second to execute are shown.) optimized libraries).…”
Section: Open Call For Collaborative Op-timization Of Cnnsmentioning
confidence: 99%
“…Dedicated FP languages were proposed like NOVA, from NVIDIA [6]. In a separate context rewrite rules were investigated to generate low level representations of high-level parallel constructs [7]. Our approach is different from these works in the following way: first, these languages are inaccessible for the mainstream scientist, who are familiar with C/C++, and where only low level APIs are available.…”
Section: Related Workmentioning
confidence: 99%