Faster Convolutional Neural Networks in Low Density FPGAs Using Block Pruning

Peres, Tiago; Gonçalves, Ana; Véstias, Mário P.

doi:10.1007/978-3-030-17227-5_28

Cited by 7 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In FPGA technology, compression techniques are suitable to reduce redundant parameters and memory footprint, which has direct impact in the power consumption, speed and resource use [32][33][34][35]. Cheng et al [36] presented a review of the state of the art in compression techniques, summarizing the different approaches in: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters and knowledge distillation.…”

Section: Perspectives For An Fpga Realization Of the Nlcnmentioning

confidence: 99%

A Non-Linear Convolution Network for Image Processing

et al. 2021

View full text Add to dashboard Cite

This paper proposes a new neural network structure for image processing whose convolutional layers, instead of using kernels with fixed coefficients, use space-variant coefficients. The adoption of this strategy allows the system to adapt its behavior according to the spatial characteristics of the input data. This type of layers performs, as we demonstrate, a non-linear transfer function. The features generated by these layers, compared to the ones generated by canonical CNN layers, are more complex and more suitable to fit to the local characteristics of the images. Networks composed by these non-linear layers offer performance comparable with or superior to the ones which use canonical Convolutional Networks, using fewer layers and a significantly lower number of features. Several applications of these newly conceived networks to classical image-processing problems are analyzed. In particular, we consider: Single-Image Super-Resolution (SISR), Edge-Preserving Smoothing (EPS), Noise Removal (NR), and JPEG artifacts removal (JAR).

show abstract

Section: Perspectives For An Fpga Realization Of the Nlcnmentioning

confidence: 99%

A Non-Linear Convolution Network for Image Processing

et al. 2021

View full text Add to dashboard Cite

show abstract

“…This generates unbalanced parallelism in the computation of output maps and irregular accesses to on-chip memory. A few approaches have followed to reduce the effects of sparsity [135,136]. In these works, pruning was guided by the datapath of the target computing processor so that it could take advantage of the available computing parallelism.…”

Section: Hardware-oriented Deep Neural Network Optimizationsmentioning

confidence: 99%

Moving Deep Learning to the Edge

et al. 2020

Self Cite

View full text Add to dashboard Cite

Deep learning is now present in a wide range of services and applications, replacing and complementing other machine learning algorithms. Performing training and inference of deep neural networks using the cloud computing model is not viable for applications where low latency is required. Furthermore, the rapid proliferation of the Internet of Things will generate a large volume of data to be processed, which will soon overload the capacity of cloud servers. One solution is to process the data at the edge devices themselves, in order to alleviate cloud server workloads and improve latency. However, edge devices are less powerful than cloud servers, and many are subject to energy constraints. Hence, new resource and energy-oriented deep learning models are required, as well as new computing platforms. This paper reviews the main research directions for edge computing deep learning algorithms.

show abstract

“…The first generation of CNN implementations take performance as the main optimization metric. Recently, a few works based on the single module approach [22]- [24] have started to consider other metrics such as area and power, to enable design trade-offs.…”

Section: Related Workmentioning

confidence: 99%

“…In [22], only convolutional layers are considered, while in [23], [24], convolutional and fully-connected layers are considered but the same module is used in both. Instead of using the same hardware module in all layers, pipelines of layer-specific modules have been proposed [25], [26].…”

Section: Related Workmentioning

confidence: 99%

A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs

et al. 2020

Self Cite

View full text Add to dashboard Cite

Convolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7× improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.

show abstract

Faster Convolutional Neural Networks in Low Density FPGAs Using Block Pruning

Cited by 7 publications

References 15 publications

A Non-Linear Convolution Network for Image Processing

A Non-Linear Convolution Network for Image Processing

Moving Deep Learning to the Edge

A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs

Contact Info

Product

Resources

About