CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Han, Xushen; Zhou, Dajiang; Wang, Shihao

doi:10.1109/iccd.2016.7753296

Cited by 26 publications

(8 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to publications that focus on the acceleration of DNN inference, some publications tackle the problem of implementing backpropagation for neural network training on FPGAs as well. For example, [92] and [29] implement frameworks for CNN training on FPGAs, and [76] explores the training of LSTM layers on FPGAs. With approaches like these, it would be possible to implement FPGA-based DRL architectures with models including CNN and LSTM layers.…”

Section: Neural Network In Fpga-based Drl Implementationsmentioning

confidence: 99%

A Survey of Domain-Specific Architectures for Reinforcement Learning

Rothmann

Porrmann

2022

IEEE Access

View full text Add to dashboard Cite

Reinforcement learning algorithms have been very successful at solving sequential decisionmaking problems in many different problem domains. However, their training is often time-consuming, with training times ranging from multiple hours to weeks. The development of domain-specific architectures for reinforcement learning promises faster computation times, decreased experiment turn-around time, and improved energy efficiency. This paper presents a review of hardware architectures for the acceleration of reinforcement learning algorithms. FPGA-based implementations are the focus of this work, but GPU-based approaches are considered as well. Both tabular and deep reinforcement learning algorithms are included in this survey. The techniques employed in different implementations are highlighted and compared. Finally, possible areas for future work are suggested, based on the preceding discussion of existing architectures.

show abstract

Section: Neural Network In Fpga-based Drl Implementationsmentioning

confidence: 99%

A Survey of Domain-Specific Architectures for Reinforcement Learning

Rothmann

Porrmann

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…There is also some special idea on accelerating CNNs. The authors of [15, 16] have utilised the reconfigurability of the FPGA to create a runtime configurable CNNs accelerator. This really saves a lot of resources but spend too much time on configuring the FPGA before computation.…”

Section: Related Workmentioning

confidence: 99%

XNORCONV: CNNs accelerator implemented on FPGA using a hybrid CNNs structure and an inter‐layer pipeline method

Zhang

2020

IET Image Processing

View full text Add to dashboard Cite

“…Due to the high-performance, reconfigurability and energy-efficient nature of FPGAs, many FPGA-based accelerators [14][15][16][17][18] have been proposed that can implement CNNs; these have achieved high throughput and improved energy efficiency. Several novel reconfiguration architectures were proposed in [14] that improve the sum-of-products operations used in the convolutional kernels of CNNs.…”

Section: Related Workmentioning

confidence: 99%

“…In [15], a modified Caffe CNN framework is presented; this framework implements CNNs using FPGAs, allowing transparent support to be given to individual FPGA implementation of CNN layers. In 2016, CNN-MERP, a CNN processor incorporating an efficient memory hierarchy, was produced by Han et al [16]; this processor was shown to have significantly lower bandwidth requirements. Bettoni et al [17] proposed an FPGA implementation of CNNs in low-power embedded systems; this study addressed portability and power efficiency.…”

Section: Related Workmentioning

confidence: 99%

Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

et al. 2019

View full text Add to dashboard Cite

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency.

show abstract

CNN-MERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks

Cited by 26 publications

References 17 publications

A Survey of Domain-Specific Architectures for Reinforcement Learning

A Survey of Domain-Specific Architectures for Reinforcement Learning

XNORCONV: CNNs accelerator implemented on FPGA using a hybrid CNNs structure and an inter‐layer pipeline method

Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

Contact Info

Product

Resources

About