2018 28th International Conference on Field Programmable Logic and Applications (FPL) 2018
DOI: 10.1109/fpl.2018.00059
|View full text |Cite
|
Sign up to set email alerts
|

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Abstract: Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplicationdependent applications can use reduced-precision integer or fixedpoint representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
54
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 90 publications
(54 citation statements)
references
References 10 publications
0
54
0
Order By: Relevance
“…Algorithm 1 represents a straight forward algorithm by which bit-serial matrix multiplication can be performed [6], with Equation 1 representing how a single element of the result matrix is calculated. Priority is given to the calculation of the inner product of two binary matrices (e.g., L [0] • R [0] ) before proceeding to calculate the inner product of another bit-precision (e.g., L [0] • R [1] ).…”
Section: Bit-serial Matrix Multiplicationmentioning
confidence: 99%
See 3 more Smart Citations
“…Algorithm 1 represents a straight forward algorithm by which bit-serial matrix multiplication can be performed [6], with Equation 1 representing how a single element of the result matrix is calculated. Priority is given to the calculation of the inner product of two binary matrices (e.g., L [0] • R [0] ) before proceeding to calculate the inner product of another bit-precision (e.g., L [0] • R [1] ).…”
Section: Bit-serial Matrix Multiplicationmentioning
confidence: 99%
“…An accelerator optimized for performing binary matrix multiplication is not sufficient to take advantage of the locality-aware scheduling algorithm described in the previous section (Algorithm 2). BISMO [6] was designed for efficient computation of binary matrix multiplications but due to its software programmability it provides the necessary flexibility to evaluate a large variety of different scheduling algorithms.…”
Section: The Bismo Acceleratormentioning
confidence: 99%
See 2 more Smart Citations
“…This flexibility was originally not supported by chip vendors until recently the hardware manufacturers started to implement this feature: Apple released the A12 Bionic chip that supports flexible bits for the neural network inference (Apple 2018); NVIDIA recently introduced the Turing GPU architecture that supports 1-bit, 4-bit, 8-bit and 16-bit arithmetic operations (Nvidia 2018); Imagination launched a flexible neural network IP that supports per-layer bitwidth adjustment for both weights and activations (Imagination 2018). Besides industry, recently academia also works on the bit-level flexible hardware design: BISMO (Umuroglu et al 2018) proposed the bit-serial multiplier to support multiplications of 1 to 8 bits; BitFusion (Sharma et al 2018) supports multiplications of 2, 4, 8 and 16 bits in a spatial manner.…”
Section: Introductionmentioning
confidence: 99%