2023
DOI: 10.1109/tpds.2022.3232715
|View full text |Cite
|
Sign up to set email alerts
|

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…However, Scrooge tackles inference DL workloads. Igniter [22], focused also on inference jobs, is an interference-aware resource provisioning framework that allows spatial GPU sharing and jointly optimizes resource allocation and scheduling. The authors model the performance interference analytically, and Igniter determines the optimal batch size, a lower bound for GPU resources and a greedy placement minimizing the interference.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, Scrooge tackles inference DL workloads. Igniter [22], focused also on inference jobs, is an interference-aware resource provisioning framework that allows spatial GPU sharing and jointly optimizes resource allocation and scheduling. The authors model the performance interference analytically, and Igniter determines the optimal batch size, a lower bound for GPU resources and a greedy placement minimizing the interference.…”
Section: Related Workmentioning
confidence: 99%
“…Note that, while the interference generated by node partitioning can be neglected (see, e.g., [11]- [15]), this is not generally true for GPU sharing. Nevertheless, we decided to consider this scenario, which is currently gaining popularity in the literature [19], [22], since the large memory of the most recent GPUs usually allows to simultaneously execute multiple mini-jobs, freeing resources for more demanding jobs. Moreover, GPU sharing is essential for effectively supporting the training of DL models for edge computing systems, where the available resources are limited.…”
Section: A Reference Frameworkmentioning
confidence: 99%