Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Radu, Valentin; Kaszyk, Kuba; Wen, Yonggang; Cano, José; Crowley, Elliot J.; Franke, Björn; Storkey, Amos; O’Boyle, Michael

doi:10.1109/iiswc47752.2019.9042000

Cited by 37 publications

(12 citation statements)

References 25 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each pixel of the image is fed as input to each neuron of the first layer. Neurons of one layer are connected to neurons of the next layer through channels, each of which is assigned a numerical value known as weights (64). The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65).…”

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

“…Neurons of one layer are connected to neurons of the next layer through channels, each of which is assigned a numerical value known as weights (64). The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65). Each of these neurons is associated with a numerical value called the bias, which is then added to the input sum (64,65).…”

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

“…The inputs are multiplied to the corresponding weights and their sum is sent as input to the neurons in the hidden layer (63,64,65). Each of these neurons is associated with a numerical value called the bias, which is then added to the input sum (64,65). This value is then passed through a threshold function called the activation function.…”

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

See 2 more Smart Citations

Assessing the Performance of Artificial Intelligence Systems for the Screening of Diabetic Retinopathy: A Systematic Review and Meta-Analysis

Sadjadi¹

2021

Preprint

View full text Add to dashboard Cite

Diabetic retinopathy is the most common microvascular complication of diabetes mellitus and one of the leading causes of blindness globally. Due to the progressive nature of the disease, earlier detection and timely treatment can lead to substantial reductions in the incidence of irreversible vision-loss. Artificial intelligence (AI) screening systems have offered clinically acceptable and quicker results in detecting diabetic retinopathy from retinal fundus and optical coherence tomography (OCT) images. Thus, this systematic review and meta-analysis of relevant investigations was performed to document the performance of AI screening systems that were applied to fundus and OCT images of patients from diverse geographic locations including North America, Europe, Africa, Asia, and Australia. A systematic literature search on Medline, Global Health, and PubMed was performed and studies published between October 2015 and January 2020 were included. The search strategy was based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guidelines, and AI-based investigations were mandatory for studies inclusion. The abstracts, titles, and full-texts of potentially eligible studies were screened against inclusion and exclusion criteria. Twenty-one studies were included in this systematic review; 18 met inclusion criteria for the meta-analysis. The pooled sensitivity of the evaluated AI screening systems in detecting diabetic retinopathy was 0.93 (95% CI: 0.92-0.94) and the specificity was 0.88 (95% CI: 0.86-0.89). The included studies detailed training and external validation datasets, criteria for diabetic retinopathy case ascertainment, imaging modalities, DR-grading scales, and compared AI results to those of human graders (e.g., ophthalmologists, retinal specialists, trained nurses, and other healthcare providers) as a reference standard. The findings of this study showed that the majority AI screening systems demonstrated clinically acceptable levels of sensitivity and specificity for detecting referable diabetic retinopathy from retinal fundus and OCT photographs. Further improvement depends on the continual development of novel algorithms with large and gradable sets of images for training and validation. If cost-effectiveness ratios can be optimized, AI can become a financially sustainable and clinically effective intervention that can be incorporated into the healthcare systems of low-to-middle income countries (LMICs) and geographically remote locations. Combining screening technologies with treatment interventions such as anti-VEGF therapy, acellular capillary laser treatment, and vitreoretinal surgery can lead to substantial reductions in the incidence of irreversible vision-loss due to proliferative diabetic retinopathy.

show abstract

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

Section: Convolution Neural Network Computation and Training Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Assessing the Performance of Artificial Intelligence Systems for the Screening of Diabetic Retinopathy: A Systematic Review and Meta-Analysis

Sadjadi¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recent work explores the performance trade-offs between reduced precision of neural networks and their speed on GPUs, e.g., performance aware pruning can lead to 3-10 times speedups [37]. Multi-precision FPGA hardware for neural networks significantly reduces model sizes, which in [38] enables an ImageNet network to fit entirely on-chip for the first time, significantly speeding up throughput.…”

Section: Device Specific Quantisationmentioning

confidence: 99%

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

et al. 2021

View full text Add to dashboard Cite

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62× speedup.

show abstract

“…A problem with existing GEMM-based implementations of dense convolution operations is that they are only optimized for kernel matrices with a multiple of 32 rows [38]. This is not a problem for unpruned convolutions since the numbers of kernels in CNN models are normally multiples of 32.…”

Section: Achieving Linear Performance For Arbitrary Matrix Sizementioning

confidence: 99%

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Rumi

Wang

et al. 2020

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

Weight pruning is a popular technique to reduce the size and computation complexity of the Convolutional Neural Networks (CNNs). Despite its success in reducing the model size, weight pruning has brought limited benefit to the CNN inference performance, due to the irregularity introduced in the sparse convolution operations. In this work, we aim to improve the performance of sparse convolutions on GPUs by mitigating the irregularity. We find that the existing performance optimization techniques for sparse matrix computations fail to accelerate sparse convolutions, and we observe that the main performance bottleneck is caused by the heavy control-flow instructions. Based on the observation, we proposed a new GEMM-based implementation of sparse convolutions. Our main idea is to extract dense blocks of non-zeros in the sparse convolution kernels, and use dense matrix-matrix multiplication for these dense blocks to achieve high throughput. For cases where many non-zero weights cannot be grouped into dense blocks, we propose a performance-aware re-pruning strategy that removes the least important weights in the sparse kernels to further improve the throughput. The experimental results with five real-world pruned CNN models show that our techniques can significantly improve the layer-wise performance of sparse convolution operations as well as the end-to-end performance of CNN inference. CCS CONCEPTS • Computing methodologies → Neural networks; • Software and its engineering → Source code generation;

show abstract

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Cited by 37 publications

References 25 publications

Assessing the Performance of Artificial Intelligence Systems for the Screening of Diabetic Retinopathy: A Systematic Review and Meta-Analysis

Assessing the Performance of Artificial Intelligence Systems for the Screening of Diabetic Retinopathy: A Systematic Review and Meta-Analysis

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning

Contact Info

Product

Resources

About