2019 IEEE International Symposium on Workload Characterization (IISWC) 2019
DOI: 10.1109/iiswc47752.2019.9042000
|View full text |Cite
|
Sign up to set email alerts
|

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Abstract: Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(12 citation statements)
references
References 25 publications
(24 reference statements)
0
12
0
Order By: Relevance
“…Each pixel of the image is fed as input to each neuron of the first layer. Neurons of one layer are connected to neurons of the next layer through channels, each of which is assigned a numerical value known as weights (64). The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65).…”
Section: Convolution Neural Network Computation and Training Methodologymentioning
confidence: 99%
See 2 more Smart Citations
“…Each pixel of the image is fed as input to each neuron of the first layer. Neurons of one layer are connected to neurons of the next layer through channels, each of which is assigned a numerical value known as weights (64). The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65).…”
Section: Convolution Neural Network Computation and Training Methodologymentioning
confidence: 99%
“…Neurons of one layer are connected to neurons of the next layer through channels, each of which is assigned a numerical value known as weights (64). The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65). Each of these neurons is associated with a numerical value called the bias, which is then added to the input sum (64,65).…”
Section: Convolution Neural Network Computation and Training Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…Recent work explores the performance trade-offs between reduced precision of neural networks and their speed on GPUs, e.g., performance aware pruning can lead to 3-10 times speedups [37]. Multi-precision FPGA hardware for neural networks significantly reduces model sizes, which in [38] enables an ImageNet network to fit entirely on-chip for the first time, significantly speeding up throughput.…”
Section: Device Specific Quantisationmentioning
confidence: 99%
“…A problem with existing GEMM-based implementations of dense convolution operations is that they are only optimized for kernel matrices with a multiple of 32 rows [38]. This is not a problem for unpruned convolutions since the numbers of kernels in CNN models are normally multiples of 32.…”
Section: Achieving Linear Performance For Arbitrary Matrix Sizementioning
confidence: 99%