Spatial Sharing of GPU for Autotuning DNN models

Dhakal, Aditya; Cho, Junguk; Kulkarni, Sameer; Ramakrishnan, K. K.; Sharma, Puneet

doi:10.48550/arxiv.2008.03602

Cited by 1 publication

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such assumption and the single-tenant targeted configuration however becomes unsuitable for concurrent multi-kernel execution since each kernel only has partial resource available. According to [9], the maximum throughput gap comparing single-tenant vs. multi-tenant tuned configurations for the same computing kernel could reach 5× difference.…”

Section: Mt-dl Computing Stackmentioning

confidence: 99%

“…However, as multi-tenant DNNs share the underlying resource, kernels optimized for single-tenant settings can easily become sub-optimal for multitenant scenarios. Recently, there are certain works that show multi-tenant DL computing should optimize kernel configurations according to its available resource ratio during practical execution [9], which shows a 5× throughput difference. 5 Resource-level: To achieve adaptive multi-tenant resource partitioning and provisioning, it asks for both strategic design and hardware support.…”

Section: A Challenges For Multi-tenant DL Computingmentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Multi-Tenant Deep Learning Inference on GPU

Yu¹,

Wang²,

Shangguan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep Learning (DL) models have achieved superior performance. Meanwhile, the computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2× throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU become widely deployed to improve resource utilization, enhance serving throughput, and reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technique advances, we hope that this survey could shed lights on new optimization perspectives and motivate novel works in future large-scale DL system optimization.

show abstract

Section: Mt-dl Computing Stackmentioning

confidence: 99%

Section: A Challenges For Multi-tenant DL Computingmentioning

confidence: 99%