Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2022
DOI: 10.1145/3503222.3507721
|View full text |Cite
|
Sign up to set email alerts
|

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

Abstract: Multi-stage user-facing applications on GPUs are widely-used nowadays, and are often implemented to be microservices. Prior research works are not applicable to ensuring QoS of GPU-based microservices due to the di erent communication patterns and shared resource contentions. We propose Astraea to manage GPU microservices considering the above factors. In Astraea, a microservice deployment policy is used to maximize the supported peak service load while ensuring the required QoS. To adaptively switch the commu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 54 publications
0
3
0
Order By: Relevance
“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [25] and Gslice [69] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [70][71][72][73][74][75]. However, none of the above works addresses the challenges and limitations of using MIG-enabled GPU sharing which, as we discussed in Sec.…”
Section: Related Workmentioning
confidence: 99%
“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [25] and Gslice [69] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [70][71][72][73][74][75]. However, none of the above works addresses the challenges and limitations of using MIG-enabled GPU sharing which, as we discussed in Sec.…”
Section: Related Workmentioning
confidence: 99%
“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [27] and Gslice [71] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [72][73][74][75][76]. However, none of the above works addresses the challenges and limitations of using MIG-enabled GPU sharing which, as we discussed in Sec.…”
Section: Related Workmentioning
confidence: 99%
“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [65] and Gslice [95] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [96,97,98,99,100]. This dissertation explores a newly introduced feature of Multi-Instance GPU (MIG) sharing to improve system throughput and reduce carbon emissions.…”
Section: Related Workmentioning
confidence: 99%