Automatic Generation of FPGA-Specific Pipelined Accelerators

Alias, Christophe; Pasca, Bogdan; Plesco, Alexandru

doi:10.1007/978-3-642-19475-7_7

Cited by 6 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Step 1, scheduling a RPN to maximize the throughput is still an open problem. Though, we propose a partial solution for a single process, able reduce bubbles in the arithmetic datapath [3,4]. Steps 2 and 3 have been partially addressed in the context of PPN [39].…”

Section: Compilation Methodologymentioning

confidence: 99%

Data-aware process networks

Alias

Plesco²

2021

Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction

Self Cite

View full text Add to dashboard Cite

With the emergence of reconfigurable FPGA circuits as a credible alternative to GPUs for HPC acceleration, new compilation paradigms are required to map high-level algorithmic descriptions to a circuit configuration (High-Level Synthesis, HLS). In particular, novel parallelization algorithms and intermediate representations are required. In this paper, we present the data-aware process networks (DPN), a dataflow intermediate representation suitable for HLS in the context of high-performance computing. DPN combines the benefits of a low-level dataflow representation-close to the final circuit-and affine iteration space tiling to explore the parallelization trade-offs (local memory size, communication volume, parallelization degree). We outline our compilation algorithms to map a C program to a DPN (front-end), then to map a DPN to an FPGA configuration (back-end). Finally, we present synthesis results on compute-intensive kernels from the Polybench suite. CCS Concepts: • Hardware → High-level and registertransfer level synthesis; • Theory of computation → Streaming models.

show abstract

Section: Compilation Methodologymentioning

confidence: 99%

Data-aware process networks

Alias

Plesco²

2021

Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction

Self Cite

View full text Add to dashboard Cite

show abstract

“…Other solutions include the use of numerical kernels implemented in FPGAs, such as [2]. These contain distributable computing cores, and parallelisation is achieved using multiple, statically scheduled allocations.…”

Section: Matrix-vector Multiplicationmentioning

confidence: 99%

“…On this note, 1 Feedback is considered herein as computational dependence given by X (k) as k → k + 1. the algorithm shown in Fig. 1 represents a natural parallel/serial partition for a matrix vector multiplication, and the basis of many applications [52,8,68,2]. Various distinct architectures are obtained by the use of different values for k, memory allocation (i.e.…”

Section: Reduction Algorithmsmentioning

confidence: 99%

Automatic parallelisation for LTI MIMO state space systems using FPGAs. An optimisation for cost & performance

Apopei¹,

Dodd²

2012

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…In fact, the existing approaches for the hardware implementation of ISL algorithms either apply generic and ineffective optimizations, or impose very strict and limiting constraints. For instance, the work in [9] proposes a methodology to generate a hardware pipeline that spans across multiple iterations, but it is limited to only one floating-point operation per iteration, and no design space exploration is possible as the depth of the pipeline is uniquely determined. Conversely, generic HLS tools such as Xilinx Vivado [25] or Synopsys Synphony C Compiler [24] are able to handle any instance of ISL algorithms, but they perform a set of predefined and general purpose array and loop optimizations (unrolling, merging, flattering, pipelining, array partitioning, etc.)…”

Section: State-of-the-art Implementationsmentioning

confidence: 99%

A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Nacci

Rana

Bruschi

et al. 2013

Proceedings of the 50th Annual Design Automation Conference

View full text Add to dashboard Cite

The automatic generation of hardware implementations for a given algorithm is generally a difficult task, especially when data dependencies span across multiple iterations such as in iterative stencil loops (ISLs). In this paper, we introduce an automatic design flow to extract parallelism from an ISL algorithm and perform a design space exploration to identify its best FPGA hardware implementation, in terms of both area and throughput. Experimental results show that the proposed methodology generates hardware designs whose performance is comparable to the one of manuallyoptimized solutions, and orders of magnitude higher than the implementations generated by commercial high-level synthesis tools.

show abstract

Automatic Generation of FPGA-Specific Pipelined Accelerators

Cited by 6 publications

References 18 publications

Data-aware process networks

Data-aware process networks

Automatic parallelisation for LTI MIMO state space systems using FPGAs. An optimisation for cost & performance

A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices

Contact Info

Product

Resources

About