2011
DOI: 10.1007/978-3-642-19475-7_7
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Generation of FPGA-Specific Pipelined Accelerators

Abstract: Abstract-Recent increase in the complexity of the circuits has brought high-level synthesis tools as a must in the digital circuit design. However, these tools come with several limitations, and one of them is the efficient use of pipelined arithmetic operators. This paper explains how to generate efficient hardware with pipelined operators for regular codes with perfect loop nests. The part to be mapped to the operator is identified, then the program is scheduled so that each operator result is available exac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…Step 1, scheduling a RPN to maximize the throughput is still an open problem. Though, we propose a partial solution for a single process, able reduce bubbles in the arithmetic datapath [3,4]. Steps 2 and 3 have been partially addressed in the context of PPN [39].…”
Section: Compilation Methodologymentioning
confidence: 99%
“…Step 1, scheduling a RPN to maximize the throughput is still an open problem. Though, we propose a partial solution for a single process, able reduce bubbles in the arithmetic datapath [3,4]. Steps 2 and 3 have been partially addressed in the context of PPN [39].…”
Section: Compilation Methodologymentioning
confidence: 99%
“…Other solutions include the use of numerical kernels implemented in FPGAs, such as [2]. These contain distributable computing cores, and parallelisation is achieved using multiple, statically scheduled allocations.…”
Section: Matrix-vector Multiplicationmentioning
confidence: 99%
“…On this note, 1 Feedback is considered herein as computational dependence given by X (k) as k → k + 1. the algorithm shown in Fig. 1 represents a natural parallel/serial partition for a matrix vector multiplication, and the basis of many applications [52,8,68,2]. Various distinct architectures are obtained by the use of different values for k, memory allocation (i.e.…”
Section: Reduction Algorithmsmentioning
confidence: 99%
“…In fact, the existing approaches for the hardware implementation of ISL algorithms either apply generic and ineffective optimizations, or impose very strict and limiting constraints. For instance, the work in [9] proposes a methodology to generate a hardware pipeline that spans across multiple iterations, but it is limited to only one floating-point operation per iteration, and no design space exploration is possible as the depth of the pipeline is uniquely determined. Conversely, generic HLS tools such as Xilinx Vivado [25] or Synopsys Synphony C Compiler [24] are able to handle any instance of ISL algorithms, but they perform a set of predefined and general purpose array and loop optimizations (unrolling, merging, flattering, pipelining, array partitioning, etc.)…”
Section: State-of-the-art Implementationsmentioning
confidence: 99%