2016
DOI: 10.1109/tsp.2016.2566608
|View full text |Cite
|
Sign up to set email alerts
|

Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels

Abstract: Field programmable gate array are ideal hosts to custom accelerators for signal, image and data processing but demand manual register transfer level design if high performance and low cost are desired. High level synthesis reduces this design burden but requires manual design of complex on-chip and off-chip memory architectures, a major limitation in applications such as video processing. This paper presents an approach to resolve this shortcoming. A constructive process is described which can derive such acce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…Given T1 critical and T2 critical definitions, exposing the innerloop parallelism gives a significantly shorter execution time. In addition, exposing such loop structures in dataflow is also relevant in the context of Field Programmable Gate Array implementation of nested loop kernels [16]. Thus, in the following, we only consider the exposed dataflow representation of the inner-loop.…”
Section: Improved Conciseness and Memory Efficiencymentioning
confidence: 99%
“…Given T1 critical and T2 critical definitions, exposing the innerloop parallelism gives a significantly shorter execution time. In addition, exposing such loop structures in dataflow is also relevant in the context of Field Programmable Gate Array implementation of nested loop kernels [16]. Thus, in the following, we only consider the exposed dataflow representation of the inner-loop.…”
Section: Improved Conciseness and Memory Efficiencymentioning
confidence: 99%
“…We do so by investigating the use of the existing embedded ecosystem on modern FPGAs to handle IO data channels in software. Although this approach may not offer the same performance when compared to automated custom configuration methods [10], and far less effective relative to manual RTL design, it still offers design times that are comparable to an HLS approach. More importantly, it allows for data transfers to be handled in software.…”
Section: Introductionmentioning
confidence: 98%