2020
DOI: 10.48550/arxiv.2008.03602
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Spatial Sharing of GPU for Autotuning DNN models

Abstract: GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) models vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing several applications on the GPU and can improve utilization of the GPU, thus improving throughput and lowering latency. DNN models given just the right amount of GPU resources can still provide low inference latency, just as much as dedicating all of the GPU for their … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…Such assumption and the single-tenant targeted configuration however becomes unsuitable for concurrent multi-kernel execution since each kernel only has partial resource available. According to [9], the maximum throughput gap comparing single-tenant vs. multi-tenant tuned configurations for the same computing kernel could reach 5× difference.…”
Section: Mt-dl Computing Stackmentioning
confidence: 99%
See 1 more Smart Citation
“…Such assumption and the single-tenant targeted configuration however becomes unsuitable for concurrent multi-kernel execution since each kernel only has partial resource available. According to [9], the maximum throughput gap comparing single-tenant vs. multi-tenant tuned configurations for the same computing kernel could reach 5× difference.…”
Section: Mt-dl Computing Stackmentioning
confidence: 99%
“…However, as multi-tenant DNNs share the underlying resource, kernels optimized for single-tenant settings can easily become sub-optimal for multitenant scenarios. Recently, there are certain works that show multi-tenant DL computing should optimize kernel configurations according to its available resource ratio during practical execution [9], which shows a 5× throughput difference. 5 Resource-level: To achieve adaptive multi-tenant resource partitioning and provisioning, it asks for both strategic design and hardware support.…”
Section: A Challenges For Multi-tenant DL Computingmentioning
confidence: 99%