The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2021
DOI: 10.1145/3431920.3439284
|View full text |Cite
|
Sign up to set email alerts
|

Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 21 publications
0
15
0
Order By: Relevance
“…As discussed in Section 2, the platforms we evaluate are the Xilinx Alveo U200 and U280 datacenter FPGA boards [37,38]. We built our HLS C/C++ based microbenchmarks using Xilinx Vitis 2019.2 [39] in [24], and conirm the results using Xilinx Vitis 2020.2 in this work. The FPGA kernels of these microbenchmarks run at 300MHz unless otherwise speciied.…”
Section: Methodsmentioning
confidence: 82%
See 2 more Smart Citations
“…As discussed in Section 2, the platforms we evaluate are the Xilinx Alveo U200 and U280 datacenter FPGA boards [37,38]. We built our HLS C/C++ based microbenchmarks using Xilinx Vitis 2019.2 [39] in [24], and conirm the results using Xilinx Vitis 2020.2 in this work. The FPGA kernels of these microbenchmarks run at 300MHz unless otherwise speciied.…”
Section: Methodsmentioning
confidence: 82%
“…The highly repetitive accesses of simple memory operations in the microbenchmark kernel are treated as dead code and can be optimized away by Vitis HLS. From our previous work in [24], we ix this in Xilinx Vitis 2019.2 [39] by adding the volatile qualiier for those involved variables. In this work, as we move to Xilinx Vitis 2020.2 [39], adding the volatile qualiier alone cannot avoid this issue.…”
Section: Design Challenges and Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…By integrating it into a POWER9 server, the energy consumption has been reduced by 29× comparing to the CPU-only system. To make the advantage of FPGAs+HBM more accessible to software developers, researchers have proposed HLS-based optimizations for fully utilizing the HBM bandwidth [41]. With this efort, the performance of HLS-based implementations is improved by 3.5× and 8.5× in the applications of K-nearest neighbors (KNN) and sparse matrix-vector multiplication (SpMV).…”
Section: A Performance-utilization Trade-ofmentioning
confidence: 99%
“…In SyncNN, we use a hierarchical on-chip bufering technique to bufer as many weights as possible, depending on the network size and the on-chip memory size available on the FPGA board. We load the weights in a coalesced (widened bus) and burst fashion [17,40] from the of-chip memory to on-chip memory at diferent granularity. As shown in Figure 5, for every convolutional layer, the weights are resolved in four dimensions.…”
Section: Memory Access Optimizationmentioning
confidence: 99%