Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Zhang, Min; Linpeng, Li; Hai, Wang; Liu, Yan; H, Qin; Zhao, Wei

doi:10.3390/electronics8030295

Cited by 58 publications

(23 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to obtain a more robust platform, layers templates will be improved by adding the capacity to infer quantized and pruned convolutional neural networks, reducing drastically the number of operations and memory required for benchmark CNNs (e.g., AlexNet), as explained in [27].…”

Section: Discussionmentioning

confidence: 99%

Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA

2019

View full text Add to dashboard Cite

This paper presents a platform that automatically generates custom hardware accelerators for convolutional neural networks (CNNs) implemented in field-programmable gate array (FPGA) devices. It includes a user interface for configuring and managing these accelerators. The herein-presented platform can perform all the processes necessary to design and test CNN accelerators from the CNN architecture description at both layer and internal parameter levels, training the desired architecture with any dataset and generating the configuration files required by the platform. With these files, it can synthesize the register-transfer level (RTL) and program the customized CNN accelerator into the FPGA device for testing, making it possible to generate custom CNN accelerators quickly and easily. All processes save the CNN architecture description are fully automatized and carried out by the platform, which manages third-party software to train the CNN and synthesize and program the generated RTL. The platform has been tested with the implementation of some of the CNN architectures found in the state-of-the-art for freely available datasets such as MNIST, CIFAR-10, and STL-10.

show abstract

Section: Discussionmentioning

confidence: 99%

Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA

2019

View full text Add to dashboard Cite

show abstract

“…In FPGA technology, compression techniques are suitable to reduce redundant parameters and memory footprint, which has direct impact in the power consumption, speed and resource use [32][33][34][35]. Cheng et al [36] presented a review of the state of the art in compression techniques, summarizing the different approaches in: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters and knowledge distillation.…”

Section: Perspectives For An Fpga Realization Of the Nlcnmentioning

confidence: 99%

A Non-Linear Convolution Network for Image Processing

et al. 2021

View full text Add to dashboard Cite

This paper proposes a new neural network structure for image processing whose convolutional layers, instead of using kernels with fixed coefficients, use space-variant coefficients. The adoption of this strategy allows the system to adapt its behavior according to the spatial characteristics of the input data. This type of layers performs, as we demonstrate, a non-linear transfer function. The features generated by these layers, compared to the ones generated by canonical CNN layers, are more complex and more suitable to fit to the local characteristics of the images. Networks composed by these non-linear layers offer performance comparable with or superior to the ones which use canonical Convolutional Networks, using fewer layers and a significantly lower number of features. Several applications of these newly conceived networks to classical image-processing problems are analyzed. In particular, we consider: Single-Image Super-Resolution (SISR), Edge-Preserving Smoothing (EPS), Noise Removal (NR), and JPEG artifacts removal (JAR).

show abstract

“…DiracDeltaNet [ 40 ] is based on ShuffleNet [ 49 ], but replaces convolutions with shift operations and uses PACT quantization to classify at 58.7 fps on an FPGA. The architecture described in [ 50 ] uses reverse-pruning and peak-pruning strategies to improve the compression factor in AlexNet [ 16 ] without sacrificing accuracy. The authors of [ 51 ] create a design flow to implement CNN inference in FPGA-based SoCs using high-level synthesis (HLS).…”

Section: Related Workmentioning

confidence: 99%

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Pérez

Figueroa

2021

Sensors

View full text Add to dashboard Cite

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

show abstract

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Cited by 58 publications

References 21 publications

Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA

Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA

A Non-Linear Convolution Network for Image Processing

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Contact Info

Product

Resources

About