2022
DOI: 10.1155/2022/9485933
|View full text |Cite|
|
Sign up to set email alerts
|

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms

Abstract: Recurrent Neural Networks (RNNs) have become important tools for tasks such as speech recognition, text generation, or natural language processing. However, their inference may involve up to billions of operations and their large number of parameters leads to large storage size and runtime memory usage. These reasons impede the adoption of these models in real-time, on-the-edge applications. Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) have emerged as promising so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…We will also focus on enabling the acceleration of feedback loops and skip connections within MDE to achieve support for residual and recurrent layers. A post-training quantization algorithm for RNNs has already been defined in one of our previous works [78].…”
Section: Comparison With Related Workmentioning
confidence: 99%
“…We will also focus on enabling the acceleration of feedback loops and skip connections within MDE to achieve support for residual and recurrent layers. A post-training quantization algorithm for RNNs has already been defined in one of our previous works [78].…”
Section: Comparison With Related Workmentioning
confidence: 99%
“…Weight pruning is used to reduce weight parameters, which can effectively compress network models and improve network performance. In order to solve the problem of large storage demand and network performance degradation caused by a large number of parameters when RNN is applied to natural language processing, the authors of [154] focused on the computing resource demand of RNN and adopted fixed-point quantization technology in order to design an FPGA accelerator, which reduced the memory consumption by 90%, and the accuracy loss was less than 1%.…”
Section: Fpga Accelerator For Natural Language Processingmentioning
confidence: 99%
“…Different from [154] on quantifying input data, some scholars have devoted themselves to NLP task optimization based on the BERT (bidirectional encoder representation from transformers) network model [155] and have adopted the idea of full quantization to a design accelerator. Not only input data but also weights, activations, Softmax, layer normalization, and all the intermediate results are quantified in order to compress the network and improve performance [156].…”
Section: Fpga Accelerator For Natural Language Processingmentioning
confidence: 99%
“…This article has been retracted by Hindawi, as publisher, following an investigation undertaken by the publisher [ 1 ]. This investigation has uncovered evidence of systematic manipulation of the publication and peer-review process.…”
mentioning
confidence: 99%