GPUShare: Fair-Sharing Middleware for GPU Clouds

Goswami, Anshuman; Young, Jeffrey; Schwan, Karsten; Farooqui, Naila; Gavrilovska, Ada; Wolf, Matthew; Eisenhauer, Greg

doi:10.1109/ipdpsw.2016.94

Cited by 14 publications

(5 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multi-tenancy on accelerators. Sharing of GPUs across applications has been studied for cloud servers [11,14,22]. Olympian [22] and GPUShare [14] focus on sharing a single GPU across multiple users, while GSLICE [11] focuses cluster-level sharing.…”

Section: Related Workmentioning

confidence: 99%

“…Sharing of GPUs across applications has been studied for cloud servers [11,14,22]. Olympian [22] and GPUShare [14] focus on sharing a single GPU across multiple users, while GSLICE [11] focuses cluster-level sharing. In contrast to these efforts, we focus on analytic models, using queueing theory, to enable GPU or TPU multiplexing while providing response time guarantees.…”

Section: Related Workmentioning

confidence: 99%

“…Similar to traditional cloud platforms, edge clouds will also be multi-tenant in nature, which means that each edge cloud server will run multiple tenant applications. These applications share the hardware resources of edge servers, including accelerators [11,14,22]. While conventional resources such as CPU and even server GPUs [21] support virtualization features to enable them to be multiplexed across applications, edge accelerators lack such hardware features.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Liang¹,

Hanafy²,

Ali-Eldin³

et al. 2022

Preprint

View full text Add to dashboard Cite

Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this paper, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3X more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Liang¹,

Hanafy²,

Ali-Eldin³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In [30], authors propose visual analysis techniques to evaluate the execution time of high-performance applications on hybrid architectures. GPUShare [15] is a middleware solution for achieving fair sharing among different GPU processes. Chen and Lee [8] propose G-Storm, a scheduling algorithm that targets Storm big data platforms.…”

Section: Related Workmentioning

confidence: 99%

Scheduling for heterogeneous systems in accelerator-rich environments

Yesil

Öztürk

2021

J Supercomput

View full text Add to dashboard Cite

The world is creating ever more data and the applications are required to deal with ever-increasing datasets. To process such datasets heterogeneous and manycore accelerators are being deployed in various computing systems to improve energy efficiency. In this work, we present a runtime management system designed for such heterogeneous systems with manycore accelerators. More specifically, we design a resource-based runtime management system that considers application characteristics and respective execution properties on the nodes and accelerators. We propose scheduling heuristics and run time environment solutions to achieve better throughput and reduced energy in computing systems with different accelerators. We give implementation details about our framework; show different scheduling algorithms, and present experimental evaluation of our system. We also compare our approaches with an optimal scheme where integer linear programming approach has been implemented for mapping applications on the heterogeneous system. While it is possible to extend the proposed framework to a wide variety of accelerators, our initial focus is on Graphics Processing Units (GPUs). Our experimental evaluations show that including accelerator support in the management framework improves energy consumption and execution time significantly. We believe that this approach has the potential to provide an effective solution for next generation accelerator-based computing systems.

show abstract

“…GPUShare [39] schedules GPU kernels by controlling the number of executed TBs. When the TBs are dispatched, each of them checks whether the execution time of the kernel has exceeded a specified period.…”

Section: Related Workmentioning

confidence: 99%

Cooperative GPGPU Scheduling for Consolidating Server Workloads

Suzuki

Yamada

Kato

et al. 2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Graphics processing units (GPUs) have become an attractive platform for general-purpose computing (GPGPU) in various domains. Making GPUs a time-multiplexing resource is a key to consolidating GPGPU applications (apps) in multi-tenant cloud platforms. However, advanced GPGPU apps pose a new challenge for consolidation. Such highly functional GPGPU apps, referred to as GPU eaters, can easily monopolize a shared GPU and starve collocated GPGPU apps. This paper presents GLoop, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters. GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eaters' high functionality while proportionally scheduling them on a shared GPU in an isolated manner. We implemented a prototype of GLoop and ported eight GPU eaters on it. The experimental results demonstrate that our prototype successfully schedules the consolidated GPGPU apps on the basis of its scheduling policy and isolates resources among them.

show abstract

GPUShare: Fair-Sharing Middleware for GPU Clouds

Cited by 14 publications

References 14 publications

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Model-driven Cluster Resource Management for AI Workloads in Edge Clouds

Scheduling for heterogeneous systems in accelerator-rich environments

Cooperative GPGPU Scheduling for Consolidating Server Workloads

Contact Info

Product

Resources

About