Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs

Allen, Tyler; Feng, Xueyang; Ge, Rong

doi:10.1109/ipdps.2019.00035

Cited by 20 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies focused on software mechanisms to improve the efficiency of multi-processing on GPUs. T. Allen et al proposed a framework called Slate that optimizes the combination of co-located processes and dynamically adjusts the scales of them [27]. smCompactor is a similar framework to Slate, which aims at maximizing the resource utilization [28].…”

Section: Related Workmentioning

confidence: 99%

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Saroliya,

Arima,

Liu

et al. 2023

2023 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in the same generation do. However, as the available resources in GPUs have increased exponentially over the past decades, it has become increasingly difficult for a single program to fully utilize them. As a consequence, the industry has started supporting several resource partitioning features in order to improve the resource utilization by co-scheduling multiple programs on the same GPU die at the same time.Driven by the technological trend, this paper focuses on hierarchical resource partitioning on modern GPUs, and as an example, we utilize a combination of two different features available on recent NVIDIA GPUs in a hierarchical manner: MPS (Multi-Process Service), a finer-grained logical partitioning; and MIG (Multi-Instance GPU), a coarse-grained physical partitioning. We propose a method for comprehensively co-optimizing the setup of hierarchical partitioning and the selection of co-scheduling groups from a given set of jobs, based on reinforcement learning using their profiles. Our thorough experimental results demonstrate that our approach can successfully set up job concurrency, partitioning, and co-scheduling group selections simultaneously. This results in a maximum throughput improvement by a factor of 1.87 compared to the time-sharing scheduling.

show abstract

Section: Related Workmentioning

confidence: 99%

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Saroliya,

Arima,

Liu

et al. 2023

2023 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

show abstract

“…Allen et al [5] presented Slate, a software-based workloadaware GPGPU multitasking framework. Similar to Maestro, Slate selects concurrent workloads that have complementary resource demands at run-time to minimize interference for individual workloads and improve resource utilization.…”

Section: Related Workmentioning

confidence: 99%

“…Multitasking of heterogeneous workloads has not received much attention from traditional GPU management as GPUs are generally adopted in systems dedicated to specific workloads. However, due to the widespread adoption of cloud systems, heterogeneous workloads are concurrently executed within a GPGPU device, and thus maximizing resource utilization by multitasking in GPGPU has become an important issue [4,5,6]. As shown in Figure 1, modern cloud systems are equipped with GPGPU devices along with traditional host resources (CPU, memory, storage, etc.…”

Section: Introductionmentioning

confidence: 99%

Characterizing Fine-Grained Resource Utilization for Multitasking GPGPU in Cloud Systems

Cho

Bahn²

2021

IEEE Access

View full text Add to dashboard Cite

Managing GPGPU resources in cloud systems is challenging as workloads with various resource usage patterns coexist. To determine the co-location of workloads, previous studies have shown that run-time performance profiling and dynamic relocation of workloads is necessary due to interference between workloads. However, this makes instant scheduling difficult and also affects the performance of workload executions. In this article, we show that efficient resource sharing in GPGPU is possible without run-time profiling if resource usage characteristics of workloads are analyzed down to a fine-grained unit level. To extract workload characteristics, we do not perform profiling at scheduling time, but separates profiling from scheduling, thereby reducing the run-time complexity of previous approaches. Specifically, we anatomize the characteristics of various GPGPU workloads and present a new scheduling policy that aims at balancing resource utilization by co-locating workloads with complementary resource demands. Simulation experiments under various virtual machine scenarios show that the proposed policy improves the GPGPU throughput by 119.5% on average and up to 191.7%.INDEX TERMS GPGPU, resource utilization, cloud system, multitasking, thread block scheduler.

show abstract

“…In this paper, we tackle a more complex problem as it includes not only the task to partition allocation, but as well defining the partition size. Therefore, we investigate tractable heuristics aiming to explore the space of possible solutions with a reasonable complexity (pseudo polynomial complexity of θ(M • N • H) 2 . We highlight that we may omit the task index when it is not necessary.…”

Section: Heuristicsmentioning

confidence: 99%

“…The authors of [2] propose a software based kernel scheduler, Slate. Slate finds complementary resource demands to co-schedule kernels while minimizing the interference between concurrent kernels.…”

Section: Related Workmentioning

confidence: 99%

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

Zahaf¹,

Olmedo²,

Singh³

et al. 2021

Preprint

View full text Add to dashboard Cite

In order to satisfy timing constraints, modern real-time applications require massively parallel accelerators such as General Purpose Graphic Processing Units (GPGPUs). Generation after generation, the number of computing clusters made available in novel GPU architectures is steadily increasing, hence, investigating suitable scheduling approaches is now mandatory. Such scheduling approaches are related to mapping different and concurrent compute kernels within the GPU computing clusters, hence grouping GPU computing clusters into schedulable partitions. In this paper we propose novel techniques to define GPU partitions; this allows us to define suitable task-to-partition allocation mechanisms in which tasks are GPU compute kernels featuring different timing requirements. Such mechanisms will take into account the interference that GPU kernels experience when running in overlapping time windows. Hence, an effective and simple way to quantify the magnitude of such interference is also presented. We demonstrate the efficiency of the proposed approaches against the classical techniques that considered the GPU as a single, nonpartitionable resource.

show abstract

Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs

Cited by 20 publications

References 21 publications

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach

Characterizing Fine-Grained Resource Utilization for Multitasking GPGPU in Cloud Systems

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

Contact Info

Product

Resources

About