2020
DOI: 10.1109/tcad.2019.2912923
|View full text |Cite
|
Sign up to set email alerts
|

A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis

Abstract: Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memorybound algorithm, this engine combines the three concepts of loop pipelining, dataflow graph, and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…This work proposes an optimized algorithm structure for MGS-QRD through loop optimization techniques and algorithm restructuring, improved TMI by eliminating redundant matrix operations and the right choice of matrix multiplication to be integrated into a fast and efficient pseudoinverse computation hardware accelerator on FPGA. FPGA's synthesizable logic fabrics, which allow high parallelism, enable the delivering of high computational speed as demonstrated in some previous works; [6][7][8][9] moreover, it also provides flexibility and programmability with real-time embedded environment solutions. These advantages make FPGA a good potential as a hardware accelerator.…”
Section: Introductionmentioning
confidence: 93%
“…This work proposes an optimized algorithm structure for MGS-QRD through loop optimization techniques and algorithm restructuring, improved TMI by eliminating redundant matrix operations and the right choice of matrix multiplication to be integrated into a fast and efficient pseudoinverse computation hardware accelerator on FPGA. FPGA's synthesizable logic fabrics, which allow high parallelism, enable the delivering of high computational speed as demonstrated in some previous works; [6][7][8][9] moreover, it also provides flexibility and programmability with real-time embedded environment solutions. These advantages make FPGA a good potential as a hardware accelerator.…”
Section: Introductionmentioning
confidence: 93%
“…The next step consisted of comparing the results with an open library of the SpMV in FPGAs [14]. The library implements a stream version of the SpMV in the traditional CSR format.…”
Section: Comparison With An Open Librarymentioning
confidence: 99%
“…A cache vector strategy was presented in [12] and later utilized in [13], the caching scheme aims at maximizing the data reuse of the multiplying vector by performing a preprocess that determines the cache misses that later work as an input for the algorithm. The cache misses have also been treated by fully transferring the multiplying vector into the BRAM [14]. These works aim to be efficient in a broad set of matrices that does not profit the problem information of the sparsity pattern in our CFD matrices.…”
Section: Introductionmentioning
confidence: 99%
“…In [18] the authors propose a streaming dataflow architecture to perform SPMV operation in an embedded platform containing a Xilinx ZynqMP FPGA. The proposed solution consists of a deep pipeline that is constantly consuming input data with no stalls.…”
Section: Related Workmentioning
confidence: 99%
“…For the Xilinx's Data Center platform we tested the GEMM implementation from [14] and developed our version of SPMV based on the work in [18].…”
Section: High-end Fpgamentioning
confidence: 99%