Efficient CNN Accelerator on FPGA

Kala, S; Nalesh, S

doi:10.1080/03772063.2020.1821797

Cited by 23 publications

(7 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Figure 6, according to different design concepts and requirements, FPGA-based neural network optimization technology can be roughly divided into optimization for data and operation, optimization for bandwidth, and optimization for memory and access, among others, which are introduced in detail below. [71][72][73][74][75][76][77][78], less computations [79][80][81], improve calculation speed [82][83][84][85], Winograd fast convolution algorithm [86][87][88][89][90][91], Im2col convolution optimization algorithm [92][93][94][95][96][97], pipelined design [98][99][100][101][102], Roofline model [103][104][105], ping-pong cache [106][107][108][109], input feature map reuse [110,111], filter reuse [111,112], convolutional reuse [110]…”

Section: Neural Network Optimization Technology Based On Fpgamentioning

confidence: 99%

“…In 2019, Asgar Abbaszadeh et al [83] proposed a universal square matrix computing unit that was based on cyclic matrix structure and finally tested a 500 × 500 matrix on an FPGA with an operating frequency of 346 MHz, achieving a throughput of 173 GOPS. In 2020, S. Kala and S. Nalesh [84] proposed an efficient CNN accelerator that was based on block Winograd GEMM (general matrix multiplication) architecture. Using blocking technology to improve bandwidth and storage efficiency, the ResNet-18 CNN model was implemented on XC7VX690T FPGA.…”

Section: ) Winograd Fast Convolution Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

A Review of the Optimal Design of Neural Networks Based on FPGA

Wang

Luo

2022

Applied Sciences

View full text Add to dashboard Cite

Deep learning based on neural networks has been widely used in image recognition, speech recognition, natural language processing, automatic driving, and other fields and has made breakthrough progress. FPGA stands out in the field of accelerated deep learning with its advantages such as flexible architecture and logic units, high energy efficiency ratio, strong compatibility, and low delay. In order to track the latest research results of neural network optimization technology based on FPGA in time and to keep abreast of current research hotspots and application fields, the related technologies and research contents are reviewed. This paper introduces the development history and application fields of some representative neural networks and points out the importance of studying deep learning technology, as well as the reasons and advantages of using FPGA to accelerate deep learning. Several common neural network models are introduced. Moreover, this paper reviews the current mainstream FPGA-based neural network acceleration technology, method, accelerator, and acceleration framework design and the latest research status, pointing out the current FPGA-based neural network application facing difficulties and the corresponding solutions, as well as prospecting the future research directions. We hope that this work can provide insightful research ideas for the researchers engaged in the field of neural network acceleration based on FPGA.

show abstract

Section: Neural Network Optimization Technology Based On Fpgamentioning

confidence: 99%

Section: ) Winograd Fast Convolution Algorithmmentioning

confidence: 99%

A Review of the Optimal Design of Neural Networks Based on FPGA

Wang

Luo

2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…CNNs are composed of multiple layers of operations, such as convolution, pooling, ReLu, local response normalization, fully connected computation, and softmax [22], where the convolution layers are the key layers of the CNNs. Convolution operations are inspired by biological processes [23] in that the connectivity pattern between neurons resembles the organization of the human visual cortex.…”

Section: Background 221 Convolution Operationmentioning

confidence: 99%

Design of Efficient Floating-Point Convolution Module for Embedded System

Zhou

Wang

et al. 2021

Electronics

View full text Add to dashboard Cite

The convolutional neural network (CNN) has made great success in many fields, and is gradually being applied in edge-computing systems. Taking the limited budget of the resources in the systems into consideration, the implementation of CNNs on embedded devices is preferred. However, accompanying the increasingly complex CNNs is the huge cost of memory, which constrains its implementation on embedded devices. In this paper, we propose an efficient, pipelined convolution module based on a Brain Floating-Point (BF16) to solve this problem, which is composed of a quantization unit, a serial-to-matrix conversion unit, and a convolution operation unit. The mean error of the convolution module based on BF16 is only 0.1538%, which hardly affects the CNN inference. Additionally, when synthesized at 400 MHz, the area of the BF16 convolution module is 21.23% and 18.54% smaller than that of the INT16 and FP16 convolution modules, respectively. Furthermore, our module using the TSMC 90 nm library can run at 1 GHz by optimizing the critical path. Finally, our module was implemented on the Xilinx PYNQ-Z2 board to evaluate the performance. The experimental results show that at the frequency of 100 MHz, our module is, separately, 783.94 times and 579.35 times faster than the Cortex-M4 with FPU and Hummingbird E203, while maintaining an extremely low error rate.

show abstract

“…[72] implemented hybrid convolution on FPGA and analysed the occasions suitable for FFT and Winograd convolution. [35], [73], [74], [75] unified the realization of the Winograd convolution kernel matrix multiplication and maximize the reusability of the module. [76], [77] conducted a comprehensive design space exploration on the realization of Winograd convolution on FPGA.…”

Section: Cpumentioning

confidence: 99%

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

Gan¹,

Huang²

2021

Computer Science and Information Technology Trends

View full text Add to dashboard Cite

Convolutional Neural Network (CNN) has been widely used in various fields and played an important role. Convolution operators are the fundamental component of convolutional neural networks, and it is also the most time-consuming part of network training and inference. In recent years, researchers have proposed several fast convolution algorithms including FFT and Winograd. Among them, Winograd convolution significantly reduces the multiplication operations in convolution, and it also takes up less memory space than FFT convolution. Therefore, Winograd convolution has quickly become the first choice for fast convolution implementation within a few years. At present, there is no systematic summary of the convolution algorithm. This article aims to fill this gap and provide detailed references for follow-up researchers. This article summarizes the development of Winograd convolution from the three aspects of algorithm expansion, algorithm optimization, implementation, and application, and finally makes a simple outlook on the possible future directions.

show abstract

Efficient CNN Accelerator on FPGA

Cited by 23 publications

References 19 publications

A Review of the Optimal Design of Neural Networks Based on FPGA

A Review of the Optimal Design of Neural Networks Based on FPGA

Design of Efficient Floating-Point Convolution Module for Embedded System

Fast Convolution based on Winograd Minimum Filtering: Introduction and Development

Contact Info

Product

Resources

About