Fair share: Allocation of GPU resources for both performance and fairness

Aguilera, Paula; Morrow, Katherine; Kim, Nam Sung

doi:10.1109/iccd.2014.6974717

Cited by 33 publications

(21 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As an extension to their work, they propose resource partitioning policies for the co-executing applications using offline profiling data. The authors in [8,17,18] here show that general purpose applications do not scale linearly with cores. However, they do not have any policy on which applications can co-exist on the device.…”

Section: Concurrent Kernel Executionmentioning

confidence: 85%

“…The authors in [17] proposed to execute multiple applications concurrently on a GPU through resource partitioning. Last, in [8,18] GPU resource partitioning policies for multiple application execution on GPUs are presented. As an extension to their work, they propose resource partitioning policies for the co-executing applications using offline profiling data.…”

Section: Concurrent Kernel Executionmentioning

confidence: 99%

See 1 more Smart Citation

Throughput optimization and resource allocation on GPUs under multi-application execution

Punyala

Marinakis

Komaee

et al. 2018

2018 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

View full text Add to dashboard Cite

Section: Concurrent Kernel Executionmentioning

confidence: 85%

Section: Concurrent Kernel Executionmentioning

confidence: 99%

Throughput optimization and resource allocation on GPUs under multi-application execution

Punyala

Marinakis

Komaee

et al. 2018

2018 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE)

View full text Add to dashboard Cite

“…Due to the inefficient resource sharing of the hardware-based scheduling policy, recent works [1,10,13,17] propose spatially partitioned sharing (SPS) to solve the problem. It coexecutes different kernels on disjointed sets of CUs.…”

Section: Background and Motivation 21 Concurrent Execution Of Multipmentioning

confidence: 99%

“…Also, the static resources required by kernels are heterogeneous. Papers and articles [1,10,13,17] propose SPS to improve the static resource utilization and ensure fairness among concurrently executed kernels. The main disadvantage of SPS is that it only allows different kernels to execute concurrently on disjointed sets of CUs, so that although it can lessen the underutilization of static resources on the whole GPU, for each set of CUs, such underutilization persists.…”

Section: The Underutilization Of Gpu Resourcesmentioning

confidence: 99%

“…However, their LEFTOVER scheduling policy of GPU hardware schedulers [1,21] decreases the concurrency, because the first launched kernel may use up one of the static resources of a GPU and make other kernels unable to be dispatched. Solutions proposed in recent research to improve the concurrency of GPU kernels can be classified into two categories: (1) spatially partitioned sharing (SPS), which coexecutes different kernels on disjointed sets of compute units (CUs) [1,10,13,17], and (2) simultaneous multikernel (SMK), which runs multiple kernels simultaneously within a CU [14,21,22,29,31,33]. In general, SMK can improve the resource utilization even more than SPS, because SMK can launch more threads on a CU by corunning kernels with both complementary static resource requirements and interleaving instructions from kernels with low dynamic resource contentions, while SPS only allows different kernels to corun on disjointed sets of CUs [22].However, there is a lack of software solutions for applying SMK to GPUs.…”

mentioning

confidence: 99%

See 1 more Smart Citation

A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs

Liu

Lin

et al. 2020

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

As a critical computing resource in multiuser systems such as supercomputers, data centers, and cloud services, a GPU contains multiple compute units (CUs). GPU Multitasking is an intuitive solution to underutilization in GPGPU computing. Recently proposed solutions of multitasking GPUs can be classified into two categories: (1) spatially partitioned sharing (SPS), which coexecutes different kernels on disjointed sets of compute units (CU), and (2) simultaneous multikernel (SMK), which runs multiple kernels simultaneously within a CU. Compared to SPS, SMK can improve resource utilization even further due to the interleaving of instructions from kernels with low dynamic resource contentions.However, it is hard to implement SMK on current GPU architecture, because (1) techniques for applying SMK on top of GPU hardware scheduling policy are scarce and (2) finding an efficient SMK scheme is difficult due to the complex interferences of concurrently executed kernels. In this article, we propose a lightweight and effective performance model to evaluate the complex interferences of SMK. Based on the probability of independent events, our performance model is built from a totally new angle and contains limited parameters. Then, we propose a metric, symbiotic factor, which can evaluate an SMK scheme so that kernels with complementary resource utilization can corun within a CU. Also, we analyze the advantages and disadvantages of kernel slicing and kernel stretching techniques and integrate them to apply SMK on GPUs instead of simulators. We validate our model on 18 benchmarks. Compared to the optimized hardware-based concurrent kernel execution whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer, the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. 7:2 H. Wu et al.applications [30], and cloud applications [26]. Furthermore, with increasing computing power and new architecture features, new-generation GPUs can support larger and more complex computing tasks. Observation [21,29] has shown the on-chip resource underutilization of single-kernel execution. Therefore, while GPUs become more general, the underutilization of GPUs is becoming a more critical issue in modern systems. Efficiently sharing GPUs for general-purpose computing on GPU (GPGPU) applications is of great importance. Programmers write a GPGPU program using CUDA [19] or OpenCL [8] programming models and offload computing to GPU as kernels.Corunning kernels have drawn extensive attention both in industry and academia [1, 6, 10, 13, 14, 16, 17, 20-22, 28, 29, 31, 33]. The resources used by a kernel include both static resources (threads, registers, and shared memory) and dynamic resources (computing cores, memory load/store units, bandwidth, and memory interconnection). Modern GPU architectures, like NVIDIA Kepler [20] and AMD GCN [16], support c...

show abstract