BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Umuroglu, Yaman; Rasnayake, Lahiru; Själander, Magnus

doi:10.1109/fpl.2018.00059

Cited by 90 publications

(54 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Algorithm 1 represents a straight forward algorithm by which bit-serial matrix multiplication can be performed [6], with Equation 1 representing how a single element of the result matrix is calculated. Priority is given to the calculation of the inner product of two binary matrices (e.g., L [0] • R [0] ) before proceeding to calculate the inner product of another bit-precision (e.g., L [0] • R [1] ).…”

Section: Bit-serial Matrix Multiplicationmentioning

confidence: 99%

“…An accelerator optimized for performing binary matrix multiplication is not sufficient to take advantage of the locality-aware scheduling algorithm described in the previous section (Algorithm 2). BISMO [6] was designed for efficient computation of binary matrix multiplications but due to its software programmability it provides the necessary flexibility to evaluate a large variety of different scheduling algorithms.…”

Section: The Bismo Acceleratormentioning

confidence: 99%

“…For detailed information on BISMO and its software programmability please refer to the article by Y. Umuroglu et al [6].…”

Section: The Bismo Acceleratormentioning

confidence: 99%

“…However, bitserial computations can be highly beneficial for low-precision matrix multiplication as the latency is low due to the lowprecision and is offset by the inherent data-level parallelism in matrix multiplications. Parallelism can be obtained not only through instantiation of multiple bit-serial units but also through vectorization as demonstrated in BISMO [6].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing

Rasnayake

Själander

2019

2019 International Conference on Field-Programmable Technology (ICFPT)

Self Cite

View full text Add to dashboard Cite

Low-precision matrix multiplication has gained significant interest in the research community due to its applicability in the quantized neural network domain. As a result, a multitude of variable precision hardware designs have been proposed since fixed-precision hardware causes under-utilization of the hardware resources due to the low and varying precision in such applications. Bit-serial hardware takes advantage of the frugal nature of bit-serial computations that can operate on only as many bits as necessary. A bit-serial matrix multiplication consists of a summation of weighted binary matrix multiplications. In this work, we study the inherent locality of bit-serial matrix multiplications and propose a locality-aware scheduling algorithm that eliminates redundant data fetches from memory. The proposed schedule improves with up to 76% compared to a schedule that computes each binary matrix multiplication in sequence.

show abstract

Section: Bit-serial Matrix Multiplicationmentioning

confidence: 99%

Section: The Bismo Acceleratormentioning

confidence: 99%

“…For detailed information on BISMO and its software programmability please refer to the article by Y. Umuroglu et al [6].…”

Section: The Bismo Acceleratormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing

Rasnayake

Själander

2019

2019 International Conference on Field-Programmable Technology (ICFPT)

Self Cite

View full text Add to dashboard Cite

show abstract

“…This flexibility was originally not supported by chip vendors until recently the hardware manufacturers started to implement this feature: Apple released the A12 Bionic chip that supports flexible bits for the neural network inference (Apple 2018); NVIDIA recently introduced the Turing GPU architecture that supports 1-bit, 4-bit, 8-bit and 16-bit arithmetic operations (Nvidia 2018); Imagination launched a flexible neural network IP that supports per-layer bitwidth adjustment for both weights and activations (Imagination 2018). Besides industry, recently academia also works on the bit-level flexible hardware design: BISMO (Umuroglu et al 2018) proposed the bit-serial multiplier to support multiplications of 1 to 8 bits; BitFusion (Sharma et al 2018) supports multiplications of 2, 4, 8 and 16 bits in a spatial manner.…”

Section: Introductionmentioning

confidence: 99%

Hardware-Centric AutoML for Mixed-Precision Quantization

et al. 2020

View full text Add to dashboard Cite

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off accuracy, latency, energy, and model size, which is both time-consuming and usually sub-optimal. There are plenty of specialized hardware accelerators for neural networks, but little research has been done to design specialized neural networks optimized for a particular hardware accelerator. The latter is demanding given the much longer design cycle of silicon than neural nets. Conventional quantization algorithms ignore the different hardware architectures and quantize all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework that automatically determines the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model

show abstract

Design Space Exploration of Time, Energy, and Error Rate Trade-offs for CNNs Using Accuracy-Programmable Instruction Set Processors

Schuster

Heidorn

Brand

et al. 2021

Communications in Computer and Information Science

View full text Add to dashboard Cite

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing

Cited by 90 publications

References 10 publications

Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing

Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing

Hardware-Centric AutoML for Mixed-Precision Quantization

Design Space Exploration of Time, Energy, and Error Rate Trade-offs for CNNs Using Accuracy-Programmable Instruction Set Processors

Contact Info

Product

Resources

About