An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Bouguezzi, Safa; Fredj, Hana Ben; Belabed, Tarek; Valderrama, Carlos; Faiedh, Hassène; Souani, Chokri

doi:10.3390/electronics10182272

Cited by 31 publications

(11 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, since the number of input and output channels of the convolutional layer in a convolutional neural network is usually a multiple of 32, general FPGA deployment involves setting up a computing module with 32 × 32 input and output channel parallelism as a basic processing unit for convolutional computation [55][56][57][58] and then considering whether to continue to stack this basic unit according to the actual situation. Therefore, we also set up a 32-way parallel processing convolutional computation basic processing unit to compare the three methods to verify the performance of our method when completing basic parallel convolutional computation, and Table 12 shows the hardware-resource occupation of our method compared with the other two methods at 32-channel parallelism.…”

Section: Results Analysismentioning

confidence: 99%

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation

Sui

Bai

et al. 2022

Sensors

View full text Add to dashboard Cite

To address the problems of convolutional neural networks (CNNs) consuming more hardware resources (such as DSPs and RAMs on FPGAs) and their accuracy, efficiency, and resources being difficult to balance, meaning they cannot meet the requirements of industrial applications, we proposed an innovative low-bit power-of-two quantization method: the global sign-based network quantization (GSNQ). This method involves designing different quantization ranges according to the sign of the weights, which can provide a larger quantization-value range. Combined with the fine-grained and multi-scale global retraining method proposed in this paper, the accuracy loss of low-bit quantization can be effectively reduced. We also proposed a novel convolutional algorithm using shift operations to replace multiplication to help to deploy the GSNQ quantized models on FPGAs. Quantization comparison experiments performed on LeNet-5, AlexNet, VGG-Net, ResNet, and GoogLeNet showed that GSNQ has higher accuracy than most existing methods and achieves “lossless” quantization (i.e., the accuracy of the quantized CNN model is higher than the baseline) at low-bit quantization in most cases. FPGA comparison experiments showed that our convolutional algorithm does not occupy on-chip DSPs, and it also has a low comprehensive occupancy in terms of on-chip LUTs and FFs, which can effectively improve the computational parallelism, and this proves that GSNQ has good hardware-adaptation capability. This study provides theoretical and experimental support for the industrial application of CNNs.

show abstract

Section: Results Analysismentioning

confidence: 99%

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation

Sui

Bai

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…In most cases, the number of input and output channels of convolutional layers in CNNs was between 32 and 512. Therefore, in the FPGA deployment work, regardless of the method followed to accelerate the CNN computation, a convolutional computation accelerator architecture must be built with at least 32 convolutional computation modules in parallel [ 51 , 52 , 53 , 54 , 55 ]. Therefore, we also performed 32-channel parallel processing for the designed modules and compared the results of the four methods to verify the performance of the proposed method in real applications.…”

Section: Methodsmentioning

confidence: 99%

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Sui

Zhi

et al. 2023

Sensors

View full text Add to dashboard Cite

To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.

show abstract

“…Xiong et al [217] developed an FPGA-based CNN accelerator to improve the automatic segmentation of 3D brain tumors. FPGA-based accelerators are also used to implement various applications such as autonomous driving [105], [129], image classification [45], [70], fraud detection [128], cancer detection [186], etc. Table 2 summarizes the reviewed FPGAbased accelerators for specific applications.…”

Section: A Accelerators For a Specific Applicationmentioning

confidence: 99%

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

et al. 2022

View full text Add to dashboard Cite

In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions, such as future trends in DNN implementation on specialized hardware accelerators, are discussed. This review article is intended to guide hardware architects to accelerate and improve the effectiveness of deep learning research.INDEX TERMS Machine learning, field programmable gate array (FPGA), deep neural networks (DNN), deep learning (DL), application specific integrated circuits (ASIC), artificial intelligence (AI), central processing unit (CPU), graphics processing unit (GPU), hardware accelerators. FIGURE 3. A single ANN neuron with its elements (inputs, weights, bias, summer, activation function, and output).

show abstract

An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet

Cited by 31 publications

References 40 publications

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Contact Info

Product

Resources

About