Eric Holk scite author profile

Matty

et al. 2015

Dynamic parallelism allows GPU kernels to launch additional kernels at runtime directly from the GPU. In this paper we show that dynamic parallelism enables relatively simple high-performance graph algorithms for GPUs. We present breadth-first search (BFS) and single-source shortest paths (SSSP) algorithms that use dynamic parallelism to adapt to the irregular and data-driven nature of these problems. Our approach results in simple code that closely follows the highlevel description of the algorithms but yields performance competitive with the current state of the art.

GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language

Pathirage

Chauhan

et al. 2013

miniKanren, live and untagged

Byrd

Friedman

2012

We present relational interpreters for several subsets of Scheme, written in the pure logic programming language miniKanren. We demonstrate these interpreters running "backwards"-that is, generating programs that evaluate to a specified value-and show how the interpreters can trivially generate quines (programs that evaluate to themselves). We demonstrate how to transform environmentpassing interpreters written in Scheme into relational interpreters written in miniKanren. We show how constraint extensions to core miniKanren can be used to allow shadowing of the interpreter's primitive forms (using the absent o tree constraint), and to avoid having to tag expressions in the languages being interpreted (using disequality constraints and symbol/number type-constraints), simplifying the interpreters and eliminating the need for parsers/unparsers.We provide four appendices to make the code in the paper completely self-contained. Three of these appendices contain new code: the complete implementation of core miniKanren extended with the new constraints; an extended relational interpreter capable of running factorial and doing list processing; and a simple pattern matcher that uses Dijkstra guards. The other appendix presents our preferred version of code that has been presented elsewhere: the miniKanren relational arithmetic system used in the extended interpreter.

Meta-programming and auto-tuning in the search for high performance GPU code

Vollmer

Svensson

et al. 2015

Writing high performance GPGPU code is often difficult and timeconsuming, potentially requiring laborious manual tuning of lowlevel details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large.Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are expressed in an embedded DSL, and which expose compile-time parameters for tuning. Our framework allows for kernels to be polymorphic over what search strategy will tune them, and allows search strategies to be implemented in the same metalanguage as the kernel-generation code (Haskell). Further, we show how to use functional programming abstractions to enforce regular (hyper-rectangular) search spaces.We also evaluate several common search strategies on a variety of kernels, and demonstrate that the framework can tune both EDSL and ordinary CUDA code.

Schism: A Self-Hosting Scheme to WebAssembly Compiler

Holk¹

2018

Schism is a small, self-hosting compiler from a subset of R6RS Scheme to WebAssembly, a new portable low-level binary format primarily targeting Web applications. The compiler was under one thousand lines of code when it first became self-hosting, and has since grown to support additional Scheme features. While currently far from a complete Scheme, Schism supports basic control flow, most basic data types and first class procedures. Schism provides an example of a small implementation of a language targeting WebAssembly and demonstrates techniques that may be useful to other languages implementors. As a dynamically typed functional programming language, Scheme is markedly different than languages with good WebAssembly support, like C and C++, and thus shows that WebAssembly has achieved its goal of being able to support a variety of languages. Still, Schism would greatly benefit from new capabilities in WebAssembly such as a proper tail call instruction and garbage collection. Given Schism's small size, it is well-positioned to provide early implementation experience to the WebAssembly standardization process for these new features. In this paper we will discuss the design and implementation of Schism, including compromises made to enable a quick and small implementation, as well as plans for future development on Schism and influence on the WebAssembly standard.