Andrea Guerrieri scite author profile

Ienne

2019

With FPGAs facing broader application domains, the conversion of imperative languages into dataflow circuits has been recently revamped as a way to overcome the conservatism of statically scheduled high-level synthesis. Apart from the ability to extract parallelism in irregular and control-dominated applications, dynamic scheduling opens a door to speculative execution, one of the most powerful ideas in computer architecture. Speculation allows executing certain operations before it is known whether they are correct or required: it can significantly increase fine-grain parallelism in loops where the condition takes many cycles to compute; it can also increase the performance of circuits limited by potential dependencies by assuming independence early on and by reverting to the correct execution if the prediction was wrong. In this work, we detail our methodology to enable tentative and reversible execution in dynamically scheduled dataflow circuits. We create a generic framework for handling speculation in dataflow circuits and show that our approach can achieve significant performance improvements over traditional circuit generation techniques.

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Sheikhha

et al. 2020

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit's timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Shrink It or Shed It! Minimize the Use of LSQs in Dataflow Designs

Bhattacharyya

et al. 2019

When applications have unpredictable memory accesses or irregular control flow, dataflow circuits overcome the limitations of statically scheduled high-level synthesis (HLS). If memory dependences cannot be determined at compile time, dataflow circuits rely on load-store queues (LSQs) to resolve the dependences dynamically, as the circuit runs. However, when employed on reconfigurable platforms, these LSQs are resource-expensive, slow, and power-consuming. In this work, we explore techniques for reducing the cost of the memory interface in dataflow designs. Apart from exploiting standard memory analysis techniques, we present a novel approach which relies on the topology of the control and dataflow graphs to infer memory order with the purpose of minimizing the LSQ size and complexity. On benchmarks obtained automatically from C code, we show that our approach results in significant area reductions, as well as increased performance, compared to naive solutions.

Snap-On User-Space Manager for Dynamically Reconfigurable System-on-Chips

et al. 2019

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Sheikhha

ACM Trans. Reconfigurable Technol. Syst.

et al. 2021

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.