Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

Li, Hengyi; Yue, Xuebin; Wang, Zhichen; Chai, Zhilei; Wang, Wenwen; Tomiyama, Hiroyuki; Meng, Lin

doi:10.1155/2022/8039281

Cited by 20 publications

(7 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In terms of dataset ImageNet100 ( Li et al, 2022 ), it is a subset dataset of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) for evaluating the performance of DNNs, which are comprised of 100 classifies with 129,026 items which are randomly selected from ILSVRC 2012. For the experiments in the research, ImageNet100 is classified into three parts: the training set, validation set, and test set, with a proportion of 16:4:5.…”

Section: Methodsmentioning

confidence: 99%

Enhanced mechanisms of pooling and channel attention for deep learning feature maps

Yue

Meng

2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

The pooling function is vital for deep neural networks (DNNs). The operation is to generalize the representation of feature maps and progressively cut down the spatial size of feature maps to optimize the computing consumption of the network. Furthermore, the function is also the basis for the computer vision attention mechanism. However, as a matter of fact, pooling is a down-sampling operation, which makes the feature-map representation approximately to small translations with the summary statistic of adjacent pixels. As a result, the function inevitably leads to information loss more or less. In this article, we propose a fused max-average pooling (FMAPooling) operation as well as an improved channel attention mechanism (FMAttn) by utilizing the two pooling functions to enhance the feature representation for DNNs. Basically, the methods are to enhance multiple-level features extracted by max pooling and average pooling respectively. The effectiveness of the proposals is verified with VGG, ResNet, and MobileNetV2 architectures on CIFAR10/100 and ImageNet100. According to the experimental results, the FMAPooling brings up to 1.63% accuracy improvement compared with the baseline model; the FMAttn achieves up to 2.21% accuracy improvement compared with the previous channel attention mechanism. Furthermore, the proposals are extensible and could be embedded into various DNN models easily, or take the place of certain structures of DNNs. The computation burden introduced by the proposals is negligible.

show abstract

Section: Methodsmentioning

confidence: 99%

Enhanced mechanisms of pooling and channel attention for deep learning feature maps

Yue

Meng

2022

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…In many real-world applications, object detection must be performed in a timely and power-saving manner with computational resource constraints. Many other vision tasks have built lightweight models using methods, such as weight quantization [16], [17], network compression [18], computationally efficient architecture design [19], [20], [21], and so on. For some vision tasks, lightweight networks aim to achieve the best tradeoff between accuracy and efficiency, showing their superiority by reducing the model size and FLOPs with a little performance drop [22].…”

Section: B Lightweight Object Detection Modelmentioning

confidence: 99%

An Ultralightweight Object Detection Network for Empty-Dish Recycling Robots

Yue

Meng

2023

IEEE Trans. Instrum. Meas.

Self Cite

View full text Add to dashboard Cite

The emergence of empty-dish recycling robots has alleviated problems, such as labor shortages, caused by an aging population. The detection and grasping of dishes play a crucial role in empty-dish recycling robots. However, due to the limited resources of edge devices, traditional object detection models require more space to store parameters and much computational overhead, limiting the development of empty-dish recycling robots. Therefore, this article proposes an ultralightweight dish detection model YOLO-GS for an empty-dish recycling robot. We use the modified CSPDarknet as the backbone structure and design an ultralightweight neck structure for efficient feature fusion. Meanwhile, we design a lightweight head structure for object classification and bounding box coordinate regression by combining ghost shuffle convolution (GSConv2D) and the anchor-free method. For the empty-dish recycling robot to grasp the dishes, we design a dish grasp point extraction algorithm using image processing. Finally, TensorRT is used to optimize and accelerate the model for efficient and intelligent detection of dishes on the NVIDIA Jetson Xavier NX. The experimental results show that YOLO-GS achieves 99.380% mean average precision (mAP) with a parameter amount of 0.606 M. The inference speed of the TensorRT-optimized YOLO-GS algorithm reaches 31.371 FPS, which meets the needs of real-time dish detection by the empty-dish recycling robot. The image of the empty-dish recycling robot demo is available at https://www.youtube.com/watch?v=pCBo1nzm3qU&t=22s.

show abstract

“…References [117][118][119] all adopted the method of mixed precision quantization, adopting different quantization strategies according to different data accuracy requirements, making the delay lower and the reasoning accuracy higher. In [120], layered fine pruning was used to optimize VGG13BN and ResNet101, which achieved less than 1% precision loss and greatly improved the operation speed when more than 70% parameters and floatingpoint arithmetic were cut off. Some scholars combined pruning and quantization; used the hybrid pruning method to compress the model; reduced the data bit width to 8 bits through data quantization; and designed the FPGA accelerator to make CNN more flexible, more configurable, and have a higher performance.…”

Section: The Cnn Accelerator Based On Fpgamentioning

confidence: 99%

A Review of the Optimal Design of Neural Networks Based on FPGA

Wang

Luo

2022

Applied Sciences

View full text Add to dashboard Cite

Deep learning based on neural networks has been widely used in image recognition, speech recognition, natural language processing, automatic driving, and other fields and has made breakthrough progress. FPGA stands out in the field of accelerated deep learning with its advantages such as flexible architecture and logic units, high energy efficiency ratio, strong compatibility, and low delay. In order to track the latest research results of neural network optimization technology based on FPGA in time and to keep abreast of current research hotspots and application fields, the related technologies and research contents are reviewed. This paper introduces the development history and application fields of some representative neural networks and points out the importance of studying deep learning technology, as well as the reasons and advantages of using FPGA to accelerate deep learning. Several common neural network models are introduced. Moreover, this paper reviews the current mainstream FPGA-based neural network acceleration technology, method, accelerator, and acceleration framework design and the latest research status, pointing out the current FPGA-based neural network application facing difficulties and the corresponding solutions, as well as prospecting the future research directions. We hope that this work can provide insightful research ideas for the researchers engaged in the field of neural network acceleration based on FPGA.

show abstract

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

Cited by 20 publications

References 38 publications

Enhanced mechanisms of pooling and channel attention for deep learning feature maps

Enhanced mechanisms of pooling and channel attention for deep learning feature maps

An Ultralightweight Object Detection Network for Empty-Dish Recycling Robots

A Review of the Optimal Design of Neural Networks Based on FPGA

Contact Info

Product

Resources

About