Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming 2013
DOI: 10.1145/2500365.2500595
|View full text |Cite
|
Sign up to set email alerts
|

Optimising purely functional GPU programs

Abstract: Purely functional, embedded array programs are a good match for SIMD hardware, such as GPUs. However, the naive compilation of such programs quickly leads to both code explosion and an excessive use of intermediate data structures. The resulting slowdown is not acceptable on target hardware that is usually chosen to achieve high performance. In this paper, we discuss two optimisation techniques, sharing recovery and array fusion, that tackle code explosion and eliminate superfluous intermediate structures. Bot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(28 citation statements)
references
References 99 publications
0
28
0
Order By: Relevance
“…As in Accelerate [9,30], we deliberately restrict ourselves to a set of primitives for which we know that high performance CPU and GPU implementations exist. In contrast to Accelerate, we allow nesting of primitives to express nested parallelism.…”
Section: Algorithmic Primitivesmentioning
confidence: 99%
See 1 more Smart Citation
“…As in Accelerate [9,30], we deliberately restrict ourselves to a set of primitives for which we know that high performance CPU and GPU implementations exist. In contrast to Accelerate, we allow nesting of primitives to express nested parallelism.…”
Section: Algorithmic Primitivesmentioning
confidence: 99%
“…When generating code, these rules in effect allow us to fuse the implementation of different functions and avoid having to store temporary results. The functional programming community has studied more sophisticated and generic rules for fusion [13,26,30]. However, for our current restricted set of benchmarks our simpler fusion rules have proven to be sufficient.…”
Section: Reduce Rulesmentioning
confidence: 99%
“…This process involves mapping of parallelism, performing optimizations such as fusion of operations and finally code generation. This approach is used by a many systems such as Copperhead [5], Delite [3], Accelerate [6,15], LiquidMetal [9], HiDP [21], Halide [16] and NOVA [8].…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…High-level languages such as Lift [18], Accelerate [15], Delite [19], StreamIt [20] or Halide [16] have been proposed to ease programming of GPUs. These approaches are all based on parallel patterns, a concept developed in the late 80's [7].…”
Section: Introductionmentioning
confidence: 99%
“…Data parallel patterns are implemented as fixed OpenCL kernels lacking portability. Accelerate [13] is a domain specific language embedded in Hasekll for GPU programming. The implementation relies on templates of manually written CUDA kernels.…”
Section: High-level Gpu Programming Approachesmentioning
confidence: 99%