Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

Zhang, Wei; Chen, Quan; Fu, Kaihua; Zheng, Ningxin; Huang, Zhiyi; Leng, Jingwen; Guo, Minyi

doi:10.1145/3503222.3507721

Cited by 18 publications

(3 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [25] and Gslice [69] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [70][71][72][73][74][75]. However, none of the above works addresses the challenges and limitations of using MIG-enabled GPU sharing which, as we discussed in Sec.…”

Section: Related Workmentioning

confidence: 99%

Using Multi-Instance GPU for Efficient Operation of Multi-Tenant GPU Clusters

Li¹,

Patel²,

Samsi³

et al. 2022

Preprint

View full text Add to dashboard Cite

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU workloads, including complicated AI/ML models, are not able to utilize the GPU resources to their fullest extent. We propose MISO, a technique to exploit the Multi-Instance GPU (MIG) capability of NVIDIA A100 GPUs to dynamically partition GPU resources among co-located jobs. MISO's key insight is to use the lightweight, more flexible Multi-Process Service (MPS) capability to predict the best MIG partition allocation for different jobs, without incurring the overhead of implementing them during exploration. Due to its ability to utilize GPU resources more efficiently, MISO achieves 49% and 16% lower average job completion time than the unpartitioned and optimal static GPU partition schemes, respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

Using Multi-Instance GPU for Efficient Operation of Multi-Tenant GPU Clusters

Li¹,

Patel²,

Samsi³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [27] and Gslice [71] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [72][73][74][75][76]. However, none of the above works addresses the challenges and limitations of using MIG-enabled GPU sharing which, as we discussed in Sec.…”

Section: Related Workmentioning

confidence: 99%

Miso

Patel

Samsi

et al. 2022

Proceedings of the 13th Symposium on Cloud Computing

View full text Add to dashboard Cite

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU workloads, including complicated AI/ML models, are not able to utilize the GPU resources to their fullest extent -encouraging support for GPU multi-tenancy. We propose MISO, a technique to exploit the Multi-Instance GPU (MIG) capability on the latest NVIDIA datacenter GPUs (e.g., A100, H100) to dynamically partition GPU resources among colocated jobs. MISO's key insight is to use the lightweight, more flexible Multi-Process Service (MPS) capability to predict the best MIG partition allocation for different jobs, without incurring the overhead of implementing them during exploration. Due to its ability to utilize GPU resources more efficiently, MISO achieves 49% and 16% lower average job completion time than the unpartitioned and optimal static GPU partition schemes, respectively. CCS CONCEPTS• Computer systems organization → Cloud computing.

show abstract

“…Space sharing is suitable when a single application cannot efficiently use the entire GPU, which is addressed by Gavel [65] and Gslice [95] in MPS sharing mode. Many other works have addressed various areas in GPU sharing including communication, memory allocation, and latency sensitivity [96,97,98,99,100]. This dissertation explores a newly introduced feature of Multi-Instance GPU (MIG) sharing to improve system throughput and reduce carbon emissions.…”

Section: Related Workmentioning

confidence: 99%

Making machine learning on HPC systems cost-effective and carbon-friendly

View full text Add to dashboard Cite

Dissertation xv1.1 This dissertation sets out to optimize machine learning (ML) systems for both costefficiency and environmental sustainability, achieving these goals autonomously without the need for developer or user intervention. It introduces a range of innovations that, when synergized with strategies for cost-informed configuration and carbon-aware scheduling, enable the system to become cost-effective and carbonfriendly, respectively.

show abstract

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

Cited by 18 publications

References 54 publications

Using Multi-Instance GPU for Efficient Operation of Multi-Tenant GPU Clusters

Using Multi-Instance GPU for Efficient Operation of Multi-Tenant GPU Clusters

Miso

Making machine learning on HPC systems cost-effective and carbon-friendly

Contact Info

Product

Resources

About