Functional and dynamic programming in the design of parallel prefix networks

Sheeran, Mary

doi:10.1017/s0956796810000304

Cited by 13 publications

(15 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An array of length n is processed by n/2 threads, except in the case of Kogge-Stone where n threads are used. Sheeran confirms this claim when exploiting the result in later work on the design of prefix sum algorithms [33]. We refer to this as the Sheeran result.…”

Section: Related Workmentioning

confidence: 54%

“…A sequential prefix sum for inputs of length n Our second main contribution is an extension of our method and theoretical results to the case of barrier-synchronising data-parallel programs, the programming model of GPU kernels. This is a contribution over previous work on the correctness of parallel prefix sums [33,38] which applies to synchronous parallel hardware described as sequential HASKELL programs, but not to asynchronous concurrent programs. We show that if a data-parallel program implementing a generic prefix sum can be proved free from data races then correctness of the prefix sum can be established by running a single test case using the interval of summations monoid, as in the sequential case.…”

Section: Data-type Operatormentioning

confidence: 96%

See 1 more Smart Citation

A sound and complete abstraction for reasoning about parallel prefix sums

Chong

Donaldson

Ketema

2014

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

View full text Add to dashboard Cite

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance.We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length n if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n 2 lg(n)) space, and is more feasible for large n than a method by Voigtländer that uses O(n) space for the input and result but requires running O(n 2 ) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction.We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtländer's method when applied to large arrays.

show abstract

Section: Related Workmentioning

confidence: 54%

Section: Data-type Operatormentioning

confidence: 96%

A sound and complete abstraction for reasoning about parallel prefix sums

Chong

Donaldson

Ketema

2014

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

View full text Add to dashboard Cite

show abstract

“…In Java, a network topology is described in the form of loops and recursive function calls. Sheeran [35] proposed similar but more generalized approach for prefix network generation from abstract specification, by modeling prefix networks using functional programming language, Haskell. In our case, the network construction in Java is more practical, as the objective is solely to generate RTL descriptions for existing prefix networks, and to evaluate their performance on FPGA.…”

Section: Area/speed Results For Max-prefix Networkmentioning

confidence: 99%

Combining execution pipelines to improve parallel implementation of HMMER on FPGA

Abbas

Derrien

Rajopadhye

et al. 2015

Microprocessors and Microsystems

View full text Add to dashboard Cite

International audienceHMMER is a widely used tool in bioinformatic, based on the Profile Hidden Markov Models. The computation kernels of HMMER, namely MSV and P7Viterbi are very compute intensive, and their data dependencies if interpreted naively, lead to a purely sequential execution. In this paper, we propose a original parallelization scheme for HMMER by rewriting the mathematical formulation, to expose hidden potential parallelization opportunities. Our parallelization scheme targets FPGA technology, and our architecture can achieve 10 times speedup compared with the latest HMMER3 SSE version, without compromising on the sensitivity of original algorithm

show abstract

“…Sheeran [2011] has used her various platforms to explore different kinds of circuits and has shown that rapid feedback in the form of simulation, testing, and model checking is most valuable to the designer. Johnson and Bose [1997] and Seger et al [2005] make similar observations about their refinement efforts.…”

Section: Discussionmentioning

confidence: 99%

“…They have also been used to analyze many other combinational circuits, such as adders and multipliers [Axelsson 2003], and as a host for a sequential language much simpler than what we discuss in Section 3.1 [Claessen 2001, Chapter 6]. More recently, Sheeran [2005Sheeran [ , 2011 has developed techniques for context-sensitive circuit generators and optimisers using this system.…”

Section: Lava 2000mentioning

confidence: 99%

Synchronous digital circuits as functional programs

Gammie

2013

ACM Comput. Surv.

View full text Add to dashboard Cite

Functional programming techniques have been used to describe synchronous digital circuits since the early 1980s. Here we survey the systems and formal underpinnings that constitute this tradition. We situate these techniques with respect to other formal methods for hardware design and discuss the work yet to be done.Hardware designs traverse a series of abstraction layers: what might begin as a highlevel behavioural model that addresses architectural issues will, when mature, typically be manually translated into a Register-Transfer Level (RTL) description that captures how the high-level computations are performed by the finite-state means of logic gates and memories. This is typically validated against the original model using simulation and testing, or more formally with model-checking techniques or a proof assistant. The resulting netlists (circuit schematics represented as graphs) are semiautomatically mapped to an implementation technology and laid out for realisation in silicon.The original motivation for developing Domain-Specific Languages (DSLs) [Mernik et al. 2005] for the upper reaches of this process was to harness the huge increases in transistor densities on silicon chips forecast by Moore's law [Mead and Conway 1980]. It was hoped that productivity would rise with the abstraction level, yielding designs that were more reusable, scalable, and correct. Traditional imperative programming languages were a poor fit, as their implicit sequentiality conflicts with the intrinsic parallelism of hardware, and a global store is in tension with the ideal of placing computations physically near the relevant state [Nikhil 2011]. For these reasons, I thank

show abstract

Functional and dynamic programming in the design of parallel prefix networks

Cited by 13 publications

References 32 publications

A sound and complete abstraction for reasoning about parallel prefix sums

A sound and complete abstraction for reasoning about parallel prefix sums

Combining execution pipelines to improve parallel implementation of HMMER on FPGA

Synchronous digital circuits as functional programs

Contact Info

Product

Resources

About