Proceedings of the 48th International Conference on Parallel Processing 2019
DOI: 10.1145/3337821.3337839
|View full text |Cite
|
Sign up to set email alerts
|

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Abstract: Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user privacy. The Convolutional Neural Network (CNN ) is one of the most widely used model family in the applications. Given the high computational complexity of the CNN models, it is favorable to execute them on the integrated GPUs at the edge devices, which are ubiquitous and have… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2019
2019
2025
2025

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 28 publications
(10 citation statements)
references
References 26 publications
0
10
0
Order By: Relevance
“…GPUs are increasingly being used both in training and inference of the CNNs architectures due to their high-performance capability of processing vectored data, making them a perfect fit for CNNs [27,62]. Then, as our case-study scenarios involve edge devices, we looked into energy-efficient GPUs for mobile and edge computing.…”
Section: Performance Comparison -Cpu Vs Gpumentioning
confidence: 99%
“…GPUs are increasingly being used both in training and inference of the CNNs architectures due to their high-performance capability of processing vectored data, making them a perfect fit for CNNs [27,62]. Then, as our case-study scenarios involve edge devices, we looked into energy-efficient GPUs for mobile and edge computing.…”
Section: Performance Comparison -Cpu Vs Gpumentioning
confidence: 99%
“…As a result, it may be hard for a GPU SparseTrain implementation to beat the Tensor Core accelerated GEMM. Nevertheless, the method can be useful on GPUs without a hardware GEMM accelerator (e.g., the integrated GPUs used for inference on edge devices [49]), or when we desire higher precision than the one supported by the accelerator.…”
Section: Generalization To Other Hardwarementioning
confidence: 99%
“…Besides, constructing a quality template requires expertise in both tensor operators and hardware. It takes non-trivial research efforts [29,50,53] to develop quality templates. Despite the huge efforts in developing templates, existing manual templates only cover limited program structures because manually enumerating all optimization choices for all operators is prohibitive.…”
Section: Introductionmentioning
confidence: 99%