2019 International Conference on Field-Programmable Technology (ICFPT) 2019
DOI: 10.1109/icfpt47387.2019.00066
|View full text |Cite
|
Sign up to set email alerts
|

Lightweight Programmable DSP Block Overlay for Streaming Neural Network Acceleration

Abstract: Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…Architecture centric overlays were shown to ofer high throughput since they utilize the underlying FPGA resources more eiciently [30]. The work in [31] described a streaming overlay for fully connected layers utilizing the DSP blocks of a Zynq Ultrascale+ ZU7EV FPGA. That design was shown to achieve close to the theoretical maximum frequency while using minimal resources, but supported only feed-forward networks with the ReLU activation function, and was not shown to scale.…”
Section: Related Workmentioning
confidence: 99%
“…Architecture centric overlays were shown to ofer high throughput since they utilize the underlying FPGA resources more eiciently [30]. The work in [31] described a streaming overlay for fully connected layers utilizing the DSP blocks of a Zynq Ultrascale+ ZU7EV FPGA. That design was shown to achieve close to the theoretical maximum frequency while using minimal resources, but supported only feed-forward networks with the ReLU activation function, and was not shown to scale.…”
Section: Related Workmentioning
confidence: 99%
“…Ioannou and Fahmy 36 propose a lightweight streaming neural network overlay at RTL-level for FC-ANN models created for the Iris, Diabetes and Customer churn data sets by efficiently taking advantage of DSP blocks, which achieve a frequency which is close to the theoretical maximum frequency of the platform.…”
Section: Function-level Parallelism In Fc-ann Via Hlsmentioning
confidence: 99%