Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2020
DOI: 10.1145/3373087.3375296
|View full text |Cite
|
Sign up to set email alerts
|

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis

Abstract: Data movement is the dominating factor affecting performance and energy in modern computing systems. Consequently, many algorithms have been developed to minimize the number of I/O operations for common computing patterns. Matrix multiplication is no exception, and lower bounds have been proven and implemented both for shared and distributed memory systems. Reconfigurable hardware platforms are a lucrative target for I/O minimizing algorithms, as they offer full control of memory accesses to the programmer. Wh… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(28 citation statements)
references
References 31 publications
0
23
0
Order By: Relevance
“…In [14] the authors propose a model to optimize matrix multiplication for FPGA platforms by maximizing performance (computations) and minimizing off-chip I/O accesses. They apply their model to a particular implementation in FPGA using HLS obtaining competitive performance while maintaining high levels of abstraction in the code that allows portability between platforms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In [14] the authors propose a model to optimize matrix multiplication for FPGA platforms by maximizing performance (computations) and minimizing off-chip I/O accesses. They apply their model to a particular implementation in FPGA using HLS obtaining competitive performance while maintaining high levels of abstraction in the code that allows portability between platforms.…”
Section: Related Workmentioning
confidence: 99%
“…For the Xilinx's Data Center platform we tested the GEMM implementation from [14] and developed our version of SPMV based on the work in [18].…”
Section: High-end Fpgamentioning
confidence: 99%
“…A very recent paper (de Fine Licht et al, 2019) investigates a high-level synthesis on the FPGA platform. The authors propose a model to optimize the Matrix Matrix Multiplication (MMM) algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…• Challenge 3 -How to design a general-purpose accelerator which does not need to be rerun the time-consuming flow of synthesis/place/route. While many accelerators have been designed for boosting computing performance and efficiency in many application domains such as deep learning [5, 11, 12, 23, 31, 35, 64-69, 77, 87, 88], dense linear algebra [23,29,30,35,77], graph processing [4,17,25,26,39,70,89,91,92,95], genomic and bio analysis [8,9,13,14,33,38,51,76,81], data sorting [10,52,60,63], most are designed for one specific problem with fixed input and output size. For FPGA accelerators even with improved tools such as [17,77], a new design will still consume many hours or even a few days due to long synthesis and place/route time.…”
Section: Introductionmentioning
confidence: 99%