With FPGAs facing broader application domains, the conversion of imperative languages into dataflow circuits has been recently revamped as a way to overcome the conservatism of statically scheduled high-level synthesis. Apart from the ability to extract parallelism in irregular and control-dominated applications, dynamic scheduling opens a door to speculative execution, one of the most powerful ideas in computer architecture. Speculation allows executing certain operations before it is known whether they are correct or required: it can significantly increase fine-grain parallelism in loops where the condition takes many cycles to compute; it can also increase the performance of circuits limited by potential dependencies by assuming independence early on and by reverting to the correct execution if the prediction was wrong. In this work, we detail our methodology to enable tentative and reversible execution in dynamically scheduled dataflow circuits. We create a generic framework for handling speculation in dataflow circuits and show that our approach can achieve significant performance improvements over traditional circuit generation techniques.
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit's timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.
When applications have unpredictable memory accesses or irregular control flow, dataflow circuits overcome the limitations of statically scheduled high-level synthesis (HLS). If memory dependences cannot be determined at compile time, dataflow circuits rely on load-store queues (LSQs) to resolve the dependences dynamically, as the circuit runs. However, when employed on reconfigurable platforms, these LSQs are resource-expensive, slow, and power-consuming. In this work, we explore techniques for reducing the cost of the memory interface in dataflow designs. Apart from exploiting standard memory analysis techniques, we present a novel approach which relies on the topology of the control and dataflow graphs to infer memory order with the purpose of minimizing the LSQ size and complexity. On benchmarks obtained automatically from C code, we show that our approach results in significant area reductions, as well as increased performance, compared to naive solutions.
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.