Proceedings of the ACM Symposium on Cloud Computing 2021
DOI: 10.1145/3472883.3486993
|View full text |Cite
|
Sign up to set email alerts
|

Scrooge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Among these, note that GPU sharing is currently gaining remarkable interest, due to the large capacity of the most recent hardware. In particular, a time sharing [18] or space sharing [19] pattern can be considered, related to the fact that jobs either exploit the entire GPU for a limited time fraction or spatially share the GPU resources with a limited percentage. While we focus on space sharing in our work, both strategies are considered in this section.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Among these, note that GPU sharing is currently gaining remarkable interest, due to the large capacity of the most recent hardware. In particular, a time sharing [18] or space sharing [19] pattern can be considered, related to the fact that jobs either exploit the entire GPU for a limited time fraction or spatially share the GPU resources with a limited percentage. While we focus on space sharing in our work, both strategies are considered in this section.…”
Section: Related Workmentioning
confidence: 99%
“…Scrooge [19] proposes a Mixed-Integer Linear Programming formulation to find the minimum-cost GPU-accelerated Virtual Machine in the cloud, meeting performance objectives. Similarly to our work, the problem is solved whenever a new job leaves or joins.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Batching and parallelism parameters are practical configuration knobs of ML inference services. Batching refers to aggregating multiple requests into one request, which is widely adapted for GPU inference systems [16,23,33,35]. However, as shown in Figure 4, inference on CPU does not substantially benefit from batching in increasing the throughput, but increasing batch size leads to higher latency.…”
Section: Experimental Evaluationmentioning
confidence: 99%