Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2020
DOI: 10.1145/3373087.3375314
|View full text |Cite
|
Sign up to set email alerts
|

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Abstract: Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(13 citation statements)
references
References 31 publications
0
13
0
Order By: Relevance
“…We developed an optimization approach [37] which allows for resource-optimal buffer placement and sizing, with the purpose of maximizing throughput of the performance-critical loops at the desired clock frequency. Our optimization strategy consists out of two main steps, as illustrated in Algorithm 2:…”
Section: Buffers and Performancementioning
confidence: 99%
“…We developed an optimization approach [37] which allows for resource-optimal buffer placement and sizing, with the purpose of maximizing throughput of the performance-critical loops at the desired clock frequency. Our optimization strategy consists out of two main steps, as illustrated in Algorithm 2:…”
Section: Buffers and Performancementioning
confidence: 99%
“…The spatial CGRA mapping is challenging because we should balance all pipeline paths by inserting queues after mapping. The number of inserted queues can significantly impact architecture cost and throughput [24]. In this work, we also evaluate an asynchronous CGRA model, where unbalanced paths do not require registers, although throughput degradation may occur, as is showed in subsection II-D.…”
Section: Reshape -Architecture-independent Mappingmentioning
confidence: 99%
“…One approach to avoid FIFOs is asynchronous data-flow mapping. A PE can process an operation if and only if all input data are available at the correct time frame [24], [26]. The throughput can be smaller than one.…”
Section: Asynchronous Data-flowmentioning
confidence: 99%
“…Dataflow circuits are fundamentally different: their schedules are not predetermined at compile time but devised as the circuit runs. Moreover, Lana [19,20] investigates how to create timing-efficient, high-throughput pipelines, and their MILP model is based on the theory of marked graphs and allows for resource-optimal buffer placement and sizing, with the purpose of maximizing throughput at the desired clock frequency. However, they are purely theoretical optimizations of the computational model without abstracting a generalized computational template for the computational model, which still requires a complete understanding of the circuit structure and does not improve the user's coding efficiency.…”
Section: Relate Workmentioning
confidence: 99%