Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2018
DOI: 10.1145/3174243.3174258
|View full text |Cite
|
Sign up to set email alerts
|

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
36
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 69 publications
(41 citation statements)
references
References 16 publications
2
36
0
Order By: Relevance
“…Shi CNN was shown to obtain 4.2× and 3.8× energy e ciency savings over two baseline CNN platforms using DSP-and LUT-based bit-parallel MACs, respectively. Moss et al presented an FPGA-based customisable matrix multiplication framework dedicated to DNN inference [100]. eir implementation allows for the runtime switching between static-precision bit-parallel and dynamic-precision bit-serial MAC implementations.…”
Section: Fixed-point Representationmentioning
confidence: 99%
“…Shi CNN was shown to obtain 4.2× and 3.8× energy e ciency savings over two baseline CNN platforms using DSP-and LUT-based bit-parallel MACs, respectively. Moss et al presented an FPGA-based customisable matrix multiplication framework dedicated to DNN inference [100]. eir implementation allows for the runtime switching between static-precision bit-parallel and dynamic-precision bit-serial MAC implementations.…”
Section: Fixed-point Representationmentioning
confidence: 99%
“…described in [3]. Other FPGA architectures have been implemented to utilize the highly amenable nature of CNNs which constrain weight parameters to be only binary or ternary representations [29], [30]. With restrictions in the efficiency of both software and hardware implementations of neural networks, software-hardware codesign is considered an effective approach to achieve optimal performance [31], [32].…”
Section: Related Workmentioning
confidence: 99%
“…Many existing frameworks [9], [25], [31], [23], [33], [39], that map CNN models to FPGAs generate a large homogeneous processing core that is temporally shared among layers. This common design is flexible, as by sequentially carrying out convolutions, it is less constrained by the amount of resources available on FPGAs.…”
Section: Related Workmentioning
confidence: 99%