dfesnippets: An Open-Source Library for Dataflow Acceleration on FPGAs

Grigoras, Paul; Burovskiy, Pavel; Arram, James; Niu, Xinyu; Cheung, Kit; Xie, Junyi; Luk, Wayne

doi:10.1007/978-3-319-56258-2_26

Cited by 3 publications

(3 citation statements)

References 32 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The maximum value of the innermost loop is received as input stream. Therefore, a buffer is used to hide the input latency of the stream, as inspired by the dfesnippets library [19], to allow for an efficient dataflow implementation. Consequently, the number of total operation ticks is increased by 4 (the input latency), however, this is a negligible increase compared to the overall operation ticks and the benefit of buffering.…”

Section: ) Gap Junctionsmentioning

confidence: 99%

flexHH: A Flexible Hardware Library for Hodgkin-Huxley-Based Neural Simulations

et al. 2020

View full text Add to dashboard Cite

The Hodgkin-Huxley (HH) neuron is one of the most biophysically-meaningful models used in computational neuroscience today. Ironically, the model's high experimental value is offset by its disproportional computational complexity. To such an extent that neuroscientists have either resorted to simpler models, losing precious neuron detail, or to using high-performance computing systems, to gain acceleration, for complex models. However, multicore/multinode CPU-based systems have proven too slow while FPGA-based ones have proven too time-consuming to (re)deploy to. Clearly, a solution that bridges user friendliness and high speedups is necessary. This paper presents flexHH, a flexible FPGA library implementing five popular, highly parameterizable variants of the HH neuron model. flexHH is the first crucial step towards making FPGA-based simulations of compute-intensive neural models available to neuroscientists without the debilitating penalty of re-engineering and re-synthesis. Through flexHH, the user can instantiate custom models and immediately take advantage of the acceleration without the mediation of an engineer, which has proven to be a major inhibitor to full adoption of FPGAs in neuroscience labs. In terms of performance, flexHH achieves speedups between 8×-20× compared to sequential-C implementations, while only a small drop in real-time capabilities is observed when compared to hardcoded FPGA-based versions of the models tested.

show abstract

Section: ) Gap Junctionsmentioning

confidence: 99%

flexHH: A Flexible Hardware Library for Hodgkin-Huxley-Based Neural Simulations

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The accumulator included the reduction circuits and an adder tree, which was introduced to eliminate out-of-order outputs and reduce buffering requirements of the reduction circuits. Wayne [11] proposed an open-source library for Dataflow acceleration on FPGAs, in which the partially compacted binary reduction tree was introduced. And the state machine was used to enable the PCBT to stall but preserve the intermediate results if necessary.…”

Section: Background and Related Workmentioning

confidence: 99%

“…In fact, the basic architectures of the reduction circuits [10], [11], [18] were not modified, but the additional circuits were introduced to improve the functionality of the reduction circuit.…”

Section: Background and Related Workmentioning

confidence: 99%

A Tag Based Random Order Vector Reduction Circuit

Huang

Chen

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Vector reduction is a very common operation to reduce a vector into a single scalar value in many scientific and engineering application scenarios. Therefore a fast and efficient vector reduction circuit has great significance to the real-time system applications. Usually the pipeline structure is widely adopted to increase the throughput of the vector reduction circuit and achieve maximum efficiency. In this paper, to deal with multiple vectors of variable length in random input sequence, a novel tag based fully pipelined vector reduction circuit is firstly proposed, in which a cache state module is used to queer and update the cache state of each vector. However, when the quantity of the input vector becomes large, a larger cache state module is required, which consumes more combinational logic and lower the operating frequency. To solve this problem, a high speed circuit is proposed in which the input vectors will be divided into several groups and sent into the dedicated cache state circuits, which can improve the operating frequency. Compared with other existing work, the prototype circuit and the improved circuit based on the prototype circuit can achieve the smallest Slices×us (<80% of the state-of-the-art work) for different input vector lengths. Moreover, both circuits can provide simple and efficient interface whose access timing is similar to that of a RAM. Therefore the circuits can be applied in a greater range. INDEX TERMS Field-programmable gate arrays, fully pipelined, vector reduction.

show abstract

Reconfigurable Acceleration of Short Read Mapping with Biological Consideration

Coleman

Liu

et al. 2021

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

dfesnippets: An Open-Source Library for Dataflow Acceleration on FPGAs

Cited by 3 publications

References 32 publications

flexHH: A Flexible Hardware Library for Hodgkin-Huxley-Based Neural Simulations

flexHH: A Flexible Hardware Library for Hodgkin-Huxley-Based Neural Simulations

A Tag Based Random Order Vector Reduction Circuit

Reconfigurable Acceleration of Short Read Mapping with Biological Consideration

Contact Info

Product

Resources

About