Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels

Milford, Matthew; McAllister, John

doi:10.1109/tsp.2016.2566608

Cited by 4 publications

(2 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given T1 critical and T2 critical definitions, exposing the innerloop parallelism gives a significantly shorter execution time. In addition, exposing such loop structures in dataflow is also relevant in the context of Field Programmable Gate Array implementation of nested loop kernels [16]. Thus, in the following, we only consider the exposed dataflow representation of the inner-loop.…”

Section: Improved Conciseness and Memory Efficiencymentioning

confidence: 99%

Delays and states in dataflow models of computation

Arrestier

Desnos

Pelcat

et al. 2018

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation

View full text Add to dashboard Cite

Dataflow Models of Computation (MoCs) have proven efficient means for modeling computational aspects of Cyber-Physical System (CPS). Over the years, diverse MoCs have been proposed that offer trade-offs between expressivity, conciseness, predictability, and reconfigurability. While being efficient for modeling coarse grain data and task parallelism, state-of-theart dataflow MoCs suffer from a lack of semantics to benefit from the lower grained parallelism offered by hierarchically modeled nested loops. State-Aware Dataflow (SAD) extends the semantics of the targeted MoC with unambiguous data persistence scope. The extended expressiveness and conciseness brought by the SAD meta-model are demonstrated with a reinforcement learning usecase.

show abstract

Section: Improved Conciseness and Memory Efficiencymentioning

confidence: 99%

Delays and states in dataflow models of computation

Arrestier

Desnos

Pelcat

et al. 2018

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation

View full text Add to dashboard Cite

show abstract

“…We do so by investigating the use of the existing embedded ecosystem on modern FPGAs to handle IO data channels in software. Although this approach may not offer the same performance when compared to automated custom configuration methods [10], and far less effective relative to manual RTL design, it still offers design times that are comparable to an HLS approach. More importantly, it allows for data transfers to be handled in software.…”

Section: Introductionmentioning

confidence: 98%

IO and data management for infrastructure as a service FPGA accelerators

Moorthy

Gopalakrishnan

2017

J Cloud Comp

View full text Add to dashboard Cite

We describe the design of a non-operating-system based embedded system to automate the management, reordering, and movement of data produced by FPGA accelerators within data centre environments. In upcoming cloud computing environments, where FPGA acceleration may be leveraged via Infrastructure as a Service (IaaS), end users will no longer have full access to the underlying hardware resources. We envision a partially reconfigurable FPGA region that end-users can access for their custom acceleration needs, and a static "template" region offered by the data centre to manage all Input/Output (IO) data requirements to the FPGA. Thus our low-level software controlled system allows for standard DDR access to off-chip memory, as well as DMA movement of data to and from SATA based SSDs, and access to Ethernet stream links. Two use cases of FPGA accelerators are presented as experimental examples to demonstrate the area and performance costs of integrating our data-management system alongside such accelerators. Comparisons are also made to fully custom data management solutions implemented solely in RTL Verilog to determine the tradeoffs in using our system in regards to development time, area, and performance. We find that for a class of accelerators in which the physical data rate of an IO channel is the limiting bottleneck to accelerator throughput, our solution offers drastically reduced logic development time spent on data management without any associated performance losses in doing so. However, for a class of applications where the IO channel is not the bottle-neck, our solution trades off increased area usage to save on design times and to maintain acceptable system throughput in the face of degraded IO throughput.

show abstract

Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation

Escobedo

Lin

2019

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

View full text Add to dashboard Cite

Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels

Cited by 4 publications

References 26 publications

Delays and states in dataflow models of computation

Delays and states in dataflow models of computation

IO and data management for infrastructure as a service FPGA accelerators

Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation

Contact Info

Product

Resources

About