Accelerating convolutional neural networks on FPGAs

Lu, Liqiang; Zheng, Size; Xiao, Qingcheng; Chen, Deming; Liang, Yun

doi:10.1360/n112018-00291

Cited by 4 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The CNN model imitates the structure of a biological vision system and is a feedforward artificial neural network. The core idea of CNN has three points: local perception, weight sharing, and pooling [ 13 ].…”

Section: Convolutional Neural Networkmentioning

confidence: 99%

“…In the forward propagation calculation of single-layer network in CNN, we can estimate its time complexity by multiplication operations times [13]. In a convolutional layer, each convolutional kernel is a k × k filter, which is applied to feature maps with the size of w × h. e number of convolutional kernels is N in × N out ; therefore, the time complexity of a convolution layer can be expressed as…”

Section: Cnn Computational Complexity Analysismentioning

confidence: 99%

See 1 more Smart Citation

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation

Chen

Siddique

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

With the swift development of deep learning applications, the convolutional neural network (CNN) has brought a tremendous challenge to traditional processors to fulfil computing requirements. It is urgent to embrace new strategies to improve efficiency and diminish energy consumption. Currently, diverse accelerator strategies for CNN computation based on the field-programmable gate array (FPGA) platform have been gradually explored because they have edges of high parallelism, low power consumption, and better programmability. This paper first illustrates state-of-the-art FPGA-based accelerator design by emphasizing the contributions and limitations of existing research works. Subsequently, we demonstrated significant concepts of parallel computing (PC) in the convolution algorithm and discussed how to accomplish parallelism based on the FPGA hardware structure. Eventually, with the proposed CPU+ FPGA framework, we performed experiments and compared the performance against traditional computation strategies in terms of the operation efficiency and energy consumption ratio. The results revealed that the efficiency of the FPGA platform is much higher than that of the central processing unit and graphics processing unit.

show abstract

Section: Convolutional Neural Networkmentioning

confidence: 99%

Section: Cnn Computational Complexity Analysismentioning

confidence: 99%

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation

Chen

Siddique

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…A CNN is mainly composed of the convolutional layer, the excitation layer, the pooling layer, and the full-connection layer. Most computations are operated in the convolutional layer [20] . In the computation of the convolutional layer, the convolution kernel moves continuously on the input feature image and performs a dot product in the overlap region to generate the input data of the next layer.…”

Section: Matrix Computation Instructionsmentioning

confidence: 99%

Neural Network Instruction Set Extension and Code Mapping Mechanism

Lou¹,

Lou²,

Wang³

et al. 2021

International Journal of Software and Informatics

View full text Add to dashboard Cite

In recent years, Convolutional Neural Networks (CNN) have received widespread attention in the field of machine learning due to their high-accuracy performance in character recognition and image classification. Nevertheless, the compute-intensive and memory-intensive characteristics of CNN have posed huge challenges to the general-purpose processor, which needs to support various workloads. Therefore, a large number of CNN-specific hardware accelerators have emerged to improve efficiency. Though significantly efficient, previous accelerators are not flexible enough. In this study, classical CNN models are analyzed, and a domain-specific instruction set of 10 matrix instructions, called RV-CNN, is designed based on the promising RISC-V architecture. By abstracting CNN computation into instructions, the proposed design can provide sufficient flexibility for CNN and possesses a higher code density than the general ISA. On this basis, a code-to-instruction mapping mechanism is proposed. By using the RV-CNN to build different CNN models on the Xilinx ZC702, this paper found that compared to x86 processors, RV-CNN has on average 141 times the energy efficiency and 8.91 times the code density; compared to GPU, it has on average 1.25 times the energy efficiency and 1.95 times the code density. In addition, compared to previous CNN accelerators, the design supports typical CNN models while at high energy efficiency.

show abstract

“…Since the training of convolutional neural network involves a large number of calculations, which is a great burden for the processor, we consider to carry out a large number of repeated numerical calculations on the coprocessor (such as GPU), while the CPU is only responsible for complex logic calculations and processing of coprocessor results, so as to improve the training speed of the network [5] . At the same time, the training and acceleration of convolutional neural networks on some domestic GPU platforms is still in the state of starting from scratch and has not reached a mature stage, and the calculation of back propagation is not as effective as other platforms.…”

Section: Introductionmentioning

confidence: 99%

Optimization of convolutional backpropagation operators based on domestic accelerators

Zhan

Chen

et al. 2023

3rd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2023)

View full text Add to dashboard Cite

Convolutional Neural Network (CNN) has always been a hot topic in deep learning. With the increasing demand for network models in daily production, the optimization of convolution calculation process is very important. This paper starts with the process of back propagation in convolutional neural network, introduces the derivation of convolutional neural network back propagation and the conversion process of im2col, uses implicit to convert the calculation of convolution on the domestic acceleration platform, and optimizes the convolution back propagation operator through a variety of general matrix multiplication optimization strategies. The final performance reaches more than 70% of the performance of NVIDIA operator, which meets the expectation of the experiment under the performance bottleneck of the platform.

show abstract

Accelerating convolutional neural networks on FPGAs

Cited by 4 publications

References 13 publications

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation

Neural Network Instruction Set Extension and Code Mapping Mechanism

Optimization of convolutional backpropagation operators based on domestic accelerators

Contact Info

Product

Resources

About