A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

Liu, Zhiqiang; Chow, Paul; Xu, Jinwei; Jiang, Jingfei; Dou, Yong; Zhou, Jie

doi:10.3390/electronics8010065

Cited by 52 publications

(29 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is obvious that the error dynamics in Equations (18) and 22are driven by acceleration, which is manifested by velocity change. If we define acceleration a k = ∆v a k /T s that is always bounded in practice such that |a k | ≤ a max , then by considering 0 ≤ δt k < T s , we can evaluate the r.h.s.…”

Section: The Mt-type Division-less Algorithm Of the Second Ordermentioning

confidence: 99%

“…In the case of the sinusoidal variation of δt, we can observe relatively good tracking for both algorithms; however, a more smooth response with slightly better accuracy was achieved by the DLMT1 algorithm (as can be observed by comparison of Figure 8c,d. The tracking performance of the DLMT1 algorithm is better than the DLMT2 algorithm simply because the velocity error in the first case is affected by the difference ∆δt only (see Equation (38)), whereas in the second case, the δt may have a significant impact as well (see Equation (18)). In the case of the sawtooth variation of δt, we can observe high-velocity peaks when the sawtooth generates abrupt changes from zero to one.…”

Section: The Experimental Systemmentioning

confidence: 99%

“…FPGAs can also ensure highly accurate and short sampling periods, which are required for advanced motion control applications. A short processing time of control algorithms is also possible; however, the feature-rich FPGAs have overwhelming resources that are required for performing complex computation algorithms [17,18] yet are still expensive, which may prevent their further use in low-cost applications. One of the challenges in the electronics industry nowadays concerns fitting complex circuitry into small spaces.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Improved Division-Less MT-Type Velocity Estimation Algorithm for Low-Cost FPGAs

Hace

2019

Electronics

View full text Add to dashboard Cite

Advanced motion control applications require smooth and highly accurate high-bandwidth velocity feedback, which is usually provided by an incremental encoder. Furthermore, high sampling rates are also demanded in order to achieve cutting-edge system performance. Such control system performance with high accuracy can be achieved easily by FPGA-based controllers. On the other hand, the well-known MT method for velocity estimation has been well proven in practice. However, its complexity, which is related to the inherent arithmetic division involved in the calculus part of the method, prevents its holistic implementation as a single-chip solution on small-size low-cost FPGAs that are suitable for practical optimized control systems. In order to overcome this obstacle, we proposed a division-less MT-type algorithm that consumes only minimal FPGA resources, which makes it proper for modern cost-optimized FPGAs. In this paper, we present new results. The recursive discrete algorithm has been further optimized, in order to improve the accuracy of the velocity estimation. The novel algorithm has also been implemented on the experimental FPGA board, and validated by practical experiments. The enhanced algorithm design resulted in improved practical performance.

show abstract

Section: The Mt-type Division-less Algorithm Of the Second Ordermentioning

confidence: 99%

Section: The Experimental Systemmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Improved Division-Less MT-Type Velocity Estimation Algorithm for Low-Cost FPGAs

Hace

2019

Electronics

View full text Add to dashboard Cite

show abstract

“…Meng develops a deep model, termed the Gabor CNN, to address the computing-resource-saving problem [13]. Liu proposes a uniform architecture design by mapping convolutions to matrix multiplications for accelerating both two dimensional (2D) and three dimensional (3D) CNNs [14]. Despite the progress made, those accelerators take the algorithm as a black box, only focusing on hardware architecture optimization.…”

Section: Introductionmentioning

confidence: 99%

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

et al. 2019

View full text Add to dashboard Cite

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs).

show abstract

“…The current rise in the complexity of the applications and the increment of the capabilities of silicon technologies, as well as the so called time to market constrain, make HLS methodologies and tools of mandatory use in the near future [1]. Due to the multiple commercial solutions that can be found in the market for multiprocessor system-on-chips (MPSoCs) nowadays, it is strictly necessary to improve its techniques and methodologies [2] so that the technology is able to deal with the multiple implementation possibilities by using high-level design [3,4].…”

Section: Introductionmentioning

confidence: 99%

High-Level Synthesis of Multiclass SVM Using Code Refactoring to Classify Brain Cancer from Hyperspectral Images

et al. 2019

View full text Add to dashboard Cite

Currently, high-level synthesis (HLS) methods and tools are a highly relevant area in the strategy of several leading companies in the field of system-on-chips (SoCs) and field programmable gate arrays (FPGAs). HLS facilitates the work of system developers, who benefit from integrated and automated design workflows, considerably reducing the design time. Although many advances have been made in this research field, there are still some uncertainties about the quality and performance of the designs generated with the use of HLS methodologies. In this paper, we propose an optimization of the HLS methodology by code refactoring using Xilinx SDSoCTM (Software-Defined System-On-Chip). Several options were analyzed for each alternative through code refactoring of a multiclass support vector machine (SVM) classifier written in C, using two different Zynq®-7000 SoC devices from Xilinx, the ZC7020 (ZedBoard) and the ZC7045 (ZC706). The classifier was evaluated using a brain cancer database of hyperspectral images. The proposed methodology not only reduces the required resources using less than 20% of the FPGA, but also reduces the power consumption −23% compared to the full implementation. The speedup obtained of 2.86× (ZC7045) is the highest found in the literature for SVM hardware implementations.

show abstract

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

Cited by 52 publications

References 15 publications

The Improved Division-Less MT-Type Velocity Estimation Algorithm for Low-Cost FPGAs

The Improved Division-Less MT-Type Velocity Estimation Algorithm for Low-Cost FPGAs

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

High-Level Synthesis of Multiclass SVM Using Code Refactoring to Classify Brain Cancer from Hyperspectral Images

Contact Info

Product

Resources

About