GPU Resource Sharing and Virtualization on High Performance Computing Systems

Li, Teng; Narayana, Vikram K.; El-Araby, Esam; El‐Ghazawi, Tarek

doi:10.1109/icpp.2011.88

Cited by 43 publications

(26 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The other category of solutions is to provide process-level GPU sharing. Our previous work [18] presented a GPU virtualization infrastructure that provides a virtual SPMD model by exposing multiple virtual GPU interfaces to the processors. The virtualization infrastructure allows for multiple processes to share the GPU using a single GPU context and to concurrently execute GPU kernels, as well as to achieve concurrency between data transfer and kernel execution.…”

Section: Related Workmentioning

confidence: 99%

“…As we explained earlier, the latest Kepler GPU architecture [19] provides native hardware Hyper-Q support, which allows multiple processes to share a single GPU context using the CUDA proxy server feature. As the purposes of utilizing our GPU virtualization approach (for Fermi or earlier GPUs), as well as using the Hyper-Q feature (for Kepler series of GPUs) are both to meet the single GPU context requirement for efficient GPU sharing, here, we provide a brief description of the GPU virtualization approach addressed by our previous work [18]. Figure 2 shows that all SPMD GPU kernels are executed within the single daemon process using CUDA streams.…”

Section: Gpu Sharing Approach With Streams For Spmd Programsmentioning

confidence: 99%

“…Therefore, our further discussions related to the SPMD programs are concentrated on identical GPU kernels from multiple processes/threads. For process-level GPU sharing, our previous work [18] provided a GPU virtualization approach to eliminate Fermi [16] or earlier GPUs' sharing inefficiency among multiple processes by providing a virtualization layer to the processes, while all GPU kernels from SPMD processes are launched from a single daemon process (virtualization layer) through Compute Unified Device Architecture (CUDA) streams [17] to achieve concurrent kernel execution and kernel concurrency with GPU I/O (Input/Output). In other words, Fermi or earlier series of GPUs do not natively support inter-process concurrencies, since multiple GPU contexts [17] are created for processes and current execution features can only happen within a single GPU context.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Narayana

El‐Ghazawi

2013

Computers

Self Cite

View full text Add to dashboard Cite

The increasing incorporation of Graphics Processing Units (GPUs) as accelerators has been one of the forefront High Performance Computing (HPC) trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD) programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to it. However, since CPUs generally outnumber GPUs, the asymmetric resource distribution gives rise to overall computing resource underutilization. In this paper, we propose to efficiently share the GPU under SPMD and formally define a series of GPU sharing scenarios. We provide performance-modeling analysis for each sharing scenario with accurate experimentation validation. With the modeling basis, we further conduct experimental studies to explore potential GPU sharing efficiency improvements from multiple perspectives. Both further theoretical and experimental GPU sharing performance analysis and results are presented. Our results not only demonstrate the significant performance gain for SPMD programs with the proposed efficient GPU sharing, but also the further improved sharing efficiency with the optimization techniques based on our accurate modeling.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Gpu Sharing Approach With Streams For Spmd Programsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Narayana

El‐Ghazawi

2013

Computers

Self Cite

View full text Add to dashboard Cite

show abstract

“…Li et al [20] introduced a virtualization layer that makes all the participating processes execute kernels in the same GPU context, similar to NVIDIA MPS [24]. GERM [7] and TimeGraph [17] focus on graphics applications and provide a GPU command schedulers integrated in the device driver.…”

Section: Related Workmentioning

confidence: 99%

“…To overcome the inefficiencies introduced by multiple processes sharing the GPU [20], NVIDIA provides a software solution called Muli-Process Service (MPS) [24]. MPS instantiates a proxy process that receives requests from client processes (e.g., processes in an MPI application) and executes them on the GPU.…”

Section: Gpu Program Executionmentioning

confidence: 99%

Enabling preemptive multiprogramming on GPUs

Tanasic

Gelado

Cabezas

et al. 2014

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service.In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.

show abstract

Accelerating Linux and Android applications on low‐power devices through remote GPGPU offloading

Montella

Kosta

Oro³

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

Summary Low‐power devices are usually highly constrained in terms of CPU computing power, memory, and GPGPU resources for real‐time applications to run. In this paper, we describe RAPID, a complete framework suite for computation offloading to help low‐powered devices overcome these limitations. RAPID supports CPU and GPGPU computation offloading on Linux and Android devices. Moreover, the framework implements lightweight secure data transmission of the offloading operations. We present the architecture of the framework, showing the integration of the CPU and GPGPU offloading modules. We show by extensive experiments that the overhead introduced by the security layer is negligible. We present the first benchmark results showing that Java/Android GPGPU code offloading is possible. Finally, we show the adoption of the GPGPU offloading into BioSurveillance, a commercial real‐time face recognition application. The results show that, thanks to RAPID, BioSurveillance is being successfully adapted to run on low‐power devices. The proposed framework is highly modular and exposes a rich application programming interface to developers, making it highly versatile while hiding the complexity of the underlying networking layer.

show abstract

GPU Resource Sharing and Virtualization on High Performance Computing Systems

Cited by 43 publications

References 14 publications

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing

Enabling preemptive multiprogramming on GPUs

Accelerating Linux and Android applications on low‐power devices through remote GPGPU offloading

Contact Info

Product

Resources

About