Optimising purely functional GPU programs

McDonell, Trevor L.; Chakravarty, Manuel M. T.; Keller, Gabriele; Lippmeier, Ben

doi:10.1145/2544174.2500595

Cited by 32 publications

(44 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many EDSLs that rely on a higher-order interface towards the user and a first-order representation for analysis and code generation: Lava [3], Pan [6], Nikola [8], Accelerate [10], Obsidian [12] and Feldspar [1], to name some. All of these EDSLs employ some kind of higher-order to first-order conversion.…”

Section: Discussion and Related Workmentioning

confidence: 99%

Using circular programs for higher-order syntax

Axelsson

Claessen

2013

Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming

View full text Add to dashboard Cite

This pearl presents a novel technique for constructing a firstorder syntax tree directly from a higher-order interface. We exploit circular programming to generate names for new variables, resulting in a simple yet efficient method. Our motivating application is the design of embedded languages supporting variable binding, where it is convenient to use higher-order syntax when constructing programs, but firstorder syntax when processing or transforming programs.

show abstract

Section: Discussion and Related Workmentioning

confidence: 99%

Using circular programs for higher-order syntax

Axelsson

Claessen

2013

Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming

View full text Add to dashboard Cite

show abstract

Section: Extended Abstractmentioning

confidence: 99%

“…Mainland [2] recently extended Template Haskell with support for quasiquoting arbitrary programming languages, which greatly simplifies writing code generators that produce complex C, CUDA, OpenCL, or Objective-C code by writing code templates in the syntax of the generated language-for example, Accelerate, an embedded language for GPU programming, makes extensive use of that facility to generate CUDA GPU code [3].In this demo, I will show that quasiquoting also enables a new form of language interoperability. Here, a simple example using Objective-C: nslog :: String -> IO () nslog msg = $(objc ['msg :> ''String] (void [cexp| NSLog(@"A message from Haskell: %@", msg) |]))…”

mentioning

confidence: 99%

Foreign inline code

Chakravarty

2014

Proceedings of the 2014 ACM SIGPLAN Symposium on Haskell

Self Cite

View full text Add to dashboard Cite

Extended AbstractTemplate Haskell, the Glasgow Haskell Compiler's (GHC) meta programming framework [4], is widely to used define macros, code generators, or even code transformation engines. Mainland [2] recently extended Template Haskell with support for quasiquoting arbitrary programming languages, which greatly simplifies writing code generators that produce complex C, CUDA, OpenCL, or Objective-C code by writing code templates in the syntax of the generated language-for example, Accelerate, an embedded language for GPU programming, makes extensive use of that facility to generate CUDA GPU code [3].In this demo, I will show that quasiquoting also enables a new form of language interoperability. Here, a simple example using Objective-C: nslog :: String -> IO () nslog msg = $(objc ['msg :> ''String] (void [cexp| NSLog(@"A message from Haskell: %@", msg) |]))The expression splice $(objc ...) introduces an inline Objective-C expression into Haskell code. It's first argument (which here is ['msg :> ''String]) is a list of all Haskell variables used and automatically marshalled to Objective-C code. The syntax 'msg is Template Haskell to quote a variable name and ''String to quote a type constructor name. The infix operator (:>) is used to annotate variables with marshalling information, in this case, the type used for type-guided marshalling. The quasiquoter [cexp|...|] quotes C expressions, returning a representation of the quoted expression as an abstract syntax tree. Here, the expression calls the function NSLog(), which on OS X and iOS writes a log message.As Objective-C is a strict superset of ANSI C, this works for inline ANSI C code as well. With appropriate support by a quasiquotation library, this approach could also be used for other languages, such as Java or C++. It might even be plausible to inline scripting languages, such as Ruby or Python.

show abstract

“…Accelerate [23] uses an elaboration of the delayed arrays representation from Repa, and in particular manages to avoid duplicating work. All array operations have a uniform representation as constructors for delayed arrays, on which fusion is performed by tree contraction.…”

Section: Related Workmentioning

confidence: 99%

A T2 graph-reduction approach to fusion

Henriksen¹,

Oancea²

2013

Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing

View full text Add to dashboard Cite

Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement. In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1-T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compilergenerated statistics related to fusion on a set of six benchmarks.

show abstract

Optimising purely functional GPU programs

Cited by 32 publications

References 36 publications

Using circular programs for higher-order syntax

Using circular programs for higher-order syntax

Foreign inline code

A T2 graph-reduction approach to fusion

Contact Info

Product

Resources

About