An FPGA-Based Energy-Efficient Reconfigurable Convolutional Neural Network Accelerator for Object Recognition Applications

Li, Jixuan; Un, Ka-Fai; Yu, Wei-Han; Mak, Pui-In; Martins, Rui P.

doi:10.1109/tcsii.2021.3095283

Cited by 43 publications

(16 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For other designs, the works [ 14 , 37 , 38 ] all used parameterizable configuration and data multiplexing, in which the authors of [ 14 ] optimised the instructions, used the ping-pong storage method, and [ 38 ] reduced the number of data accesses through kernel partitioning. The works of [ 38 , 39 ] implemented hardware using advanced chip technology on Intel FPGAs. The authors of [ 39 ] constructed a generic CNN compiler to generate customised FPGA hardware for different CNN inference tasks.…”

Section: Experimental Assessment and Resultsmentioning

confidence: 99%

“…Simultaneously, the usage of the accelerator LUT was significantly reduced because the binary convolution computation uses fewer register resources. The accelerator performance at the convolutional layer was 35.66 GOPS/W, which was 1.11 times of [ 41 ] and 1.29 times of [ 38 ]. The overall neural network accelerator performance was 29.36 GOP/W, which was 2.07 times of [ 37 ], 1.37 times of [ 14 ], 1.21 times of [ 39 ] and 1.5 times of [ 40 ].…”

Section: Experimental Assessment and Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

Guo

Deng

et al. 2023

Sensors

View full text Add to dashboard Cite

Speech recognition has progressed tremendously in the area of artificial intelligence (AI). However, the performance of the real-time offline Chinese speech recognition neural network accelerator for edge AI needs to be improved. This paper proposes a configurable convolutional neural network accelerator based on a lightweight speech recognition model, which can dramatically reduce hardware resource consumption while guaranteeing an acceptable error rate. For convolutional layers, the weights are binarized to reduce the number of model parameters and improve computational and storage efficiency. A multichannel shared computation (MCSC) architecture is proposed to maximize the reuse of weight and feature map data. The binary weight-sharing processing engine (PE) is designed to avoid limiting the number of multipliers. A custom instruction set is established according to the variable length of voice input to configure parameters for adapting to different network structures. Finally, the ping-pong storage method is used when the feature map is an input. We implemented this accelerator on Xilinx ZYNQ XC7Z035 under the working frequency of 150 MHz. The processing time for 2.24 s and 8 s of speech was 69.8 ms and 189.51 ms, respectively, and the convolution performance reached 35.66 GOPS/W. Compared with other computing platforms, accelerators perform better in terms of energy efficiency, power consumption and hardware resource consumption.

show abstract

Section: Experimental Assessment and Resultsmentioning

confidence: 99%

Section: Experimental Assessment and Resultsmentioning

confidence: 99%

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

Guo

Deng

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Over the last few decades, FPGA boards have grown in prominence, particularly in the disciplines of machine vision and robotics [29,30]. FPGA technology has a number of advantages over software operating on a CPU or GPU, including faster execution [31] and lower power consumption [32]. The FPGA hardware, as well as a high-performance conventional server, might be housed in the cloud in the suggested scheme.…”

Section: Resultsmentioning

confidence: 99%

Intelligent and Real-Time Detection and Classification Algorithm for Recycled Materials Using Convolutional Neural Networks

et al. 2022

View full text Add to dashboard Cite

In recent years, the production of municipal solid waste has constantly been increasing. Recycling is becoming more and more important, as it is the only way that we can have a clean and sustainable environment. Recycling, however, is a process that is not fully automated; large volumes of waste materials need to be processed manually. New and novel techniques have to be implemented in order to manage the increased volume of waste materials at recycling factories. In this paper, we propose a novel methodology that can identify common waste materials as they are being processed on a moving belt in waste collection facilities. An efficient waste material detection and classification system is proposed, which can be used in real integrated solid waste management systems. This system is based on a convolutional neural network and is trained using a custom dataset of images, taken on site from actual moving belts in waste collection facilities. The experimental results indicate that the proposed system can outperform existing algorithms found in the literature in real-world conditions, with 92.43% accuracy.

show abstract

“…The AI CPs, such as Field Programmable Gate Array (FPGA), GPU, and Application Specific Integrated Circuit (ASIC), handle the AI workload and leave the rest of the task to the CPU [6,7]. FPGA has been treated as a promising solution to supplant conventional processors for performing computation-intensive tasks [8][9][10][11], such as deep neural network (DNN)-based image recognition, on the UAV platforms without violating size, weight, and power constraints inherent to UAV design [12][13][14][15]. Unless otherwise stated, we adopt FPGAs as the CPs in the following analysis.…”

Section: Introductionmentioning

confidence: 99%

Joint UAV deployment, SF placement, and collaborative task scheduling in heterogeneous multi‐UAV‐empowered edge intelligence

et al. 2023

View full text Add to dashboard Cite

To support artificial intelligence (AI)‐involved tasks offloaded from the mobile devices (MDs), it is necessary to equip the Unmanned Aerial Vehicle (UAV) with custom‐made co‐processor (CP) for handling AI workloads in multi‐UAV‐empowered Edge Intelligence. Existing CPU‐oriented task scheduling algorithm cannot apply to the CPU+CP heterogeneous architecture. In this backdrop, this paper first formulates the joint service function placement, collaborative task scheduling, UAV deployment, and MD position determination problem as a Mixed Integer Non‐Linear Programming problem. Then, an alternating optimization‐based algorithm is put forward to derive a sub‐optimal solution of the problem utilizing Differential Evolution and Greedy‐based Hungarian algorithms. A series of experiments are conducted to evaluate the performance of the proposal. Results show that authors' proposal can achieve an overall revenue that is roughly 50% higher than those of existing methods.

show abstract

An FPGA-Based Energy-Efficient Reconfigurable Convolutional Neural Network Accelerator for Object Recognition Applications

Cited by 43 publications

References 20 publications

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

Intelligent and Real-Time Detection and Classification Algorithm for Recycled Materials Using Convolutional Neural Networks

Joint UAV deployment, SF placement, and collaborative task scheduling in heterogeneous multi‐UAV‐empowered edge intelligence

Contact Info

Product

Resources

About