Enabling CUDA acceleration within virtual machines using rCUDA

Duato, J.; Peña, Antonio J.; Silla, Federico; Fernández, Juan Carlos Fernández; Mayo, Rafael; Quintana‐Ortí, Enrique S.

doi:10.1109/hipc.2011.6152718

Cited by 59 publications

(42 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As our rCUDA virtualization solution aims at being compatible with the latest release, it must evolve to support the new CUDA versions. In this regard, the work presented in [5], [6], [7] supported the now obsolete CUDA 2 and 3 versions. After those initial versions of rCUDA, NVIDIA released CUDA 4, with significant changes with respect to prior versions.…”

mentioning

confidence: 75%

“…Previous work in [5], [6], [7] mainly focused on demonstrating that using remote CUDA devices is feasible. Nevertheless, three main concerns quickly arose during the completion of those studies:…”

mentioning

confidence: 99%

“…In order to enable our disruptive power-efficient proposal for HPC deployments, we have recently developed the rCUDA framework [5], [6], [7]. Our technology employs a clientserver middleware.…”

mentioning

confidence: 99%

See 2 more Smart Citations

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution

Reaño

Peña

Silla

et al. 2012

2012 19th International Conference on High Performance Computing

View full text Add to dashboard Cite

IEEEReaño González, C.; Peña Monferrer, AJ.; Silla Jiménez, F.; Duato Marín, JF.; Mayo Gual, R.; Quintana Orti, ES. (2012) Abstract-GPUs are being increasingly embraced by the high performance computing and computational communities as an effective way of considerably reducing execution time by accelerating significant parts of their application codes. However, despite their extraordinary computing capabilities, the adoption of GPUs in current HPC clusters may present certain negative side-effects. In particular, to ease job scheduling in these platforms, a GPU is usually attached to every node of the cluster. In addition to increasing acquisition costs this favors that GPUs may frequently remain idle, as applications usually do not fully utilize them. On the other hand, idle GPUs consume non-negligible amounts of energy, which translates into very poor energy efficiency during idle cycles.rCUDA was recently developed as a software solution to address these concerns. Specifically, it is a middleware that allows transparently sharing a reduced number of GPUs among the nodes in a cluster. rCUDA thus increases the GPU-utilization rate, taking care of job scheduling. While the initial prototype versions of rCUDA demonstrated its functionality, they also revealed several concerns related with usability and performance. With respect to usability, in this paper we present a new component of the rCUDA suite that allows an automatic transformation of any CUDA source code, so that it can be effectively accommodated within this technology. In response to performance, we briefly show some interesting results, which will be deeply analyzed in future publications. The net outcome is a new version of rCUDA that allows, for any CUDA-compatible program, to use remote GPUs in a cluster with minimum overhead. I. INTRODUCTIONDue to the high computational cost of current computeintensive applications, many scientists view graphic processing units (GPUs) as an efficient means of reducing the execution time of their applications. High-end GPUs include an extraordinary large amount of small computing units along with a high bandwidth to their private on-board memory. Therefore, it is no surprise that applications exhibiting a large ratio of arithmetic operations per data item can leverage the huge potential of these hardware accelerators.In GPU-accelerated applications, high performance is usually attained by off-loading the computationally intensive parts of applications for their execution in these devices. To achieve this, programmers have to specify which portion of their codes will be executed on the CPU and which functions (or kernels) will be off-loaded to the GPU. Fortunately, there have been many attempts during the last years aimed at exploiting the massive parallelism of GPUs, leading to noticeable improvements in the programmability of these hybrid

show abstract

mentioning

confidence: 75%

mentioning

confidence: 99%

See 1 more Smart Citation

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution

Reaño

Peña

Silla

et al. 2012

2012 19th International Conference on High Performance Computing

View full text Add to dashboard Cite

show abstract

“…Most of the aforementioned solutions provide a shared memory mechanism for communication between the guest and the host. An exception is rCUDA, which aims at utilizing remote GPUs and uses TCP/IP-based communication for both local and remote GPU virtualization [17].…”

Section: Background and Related Workmentioning

confidence: 99%

“…GVirtuS with its shared memory module can suffer from performance degradation when the kernel size is under 250 µs (CFD, HS, SRD1, MM, and SCAN). Trapping to the OS kernel per request causes high overhead as explained in Section 3.1. rCUDA uses TCP/IP for inter-VM communication [17]. For high network performance, we enabled virtio, which is a para-virtualized network driver for KVM and offers up to 30 Gbps of inter-VM bandwidth in our system.…”

Section: Trap-less Architecture Evaluationmentioning

confidence: 99%

FairGV: Fair and Fast GPU Virtualization

Hong

Spence

Nikolopoulos

2017

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Increasingly high performance computing (HPC) application developers are opting to use cloud resources due to higher availability. Virtualized GPUs would be an obvious and attractive option for HPC application developers using cloud hosting services. Unfortunately, existing GPU virtualization software is not ready to address fairness, utilization, and performance limitations associated with consolidating mixed HPC workloads. This paper presents FairGV, a radically redesigned GPU virtualization system that achieves system-wide weighted fair sharing and strong performance isolation in mixed workloads that use GPUs with variable degrees of intensity. To achieve its objectives, FairGV introduces a trap-less GPU processing architecture, a new fair queuing method integrated with work-conserving and GPU-centric coscheduling polices, and a collaborative scheduling method for non-preemptive GPUs. Our prototype implementation achieves near ideal fairness (≥ 0.97 Min-Max Ratio) with little performance degradation (≤ 1.02 aggregated overhead) in a range of mixed HPC workloads that leverage GPUs.

show abstract

Improving the user experience of the rCUDA remote GPU virtualization framework

Reaño

Silla

Castelló

et al. 2014

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Registro de acceso restringido Este recurso no está disponible en acceso abierto por política de la editorial. No obstante, se puede acceder al texto completo desde la Universitat Jaume I o si el usuario cuenta con suscripción. Registre d'accés restringit Aquest recurs no està disponible en accés obert per política de l'editorial. No obstant això, es pot accedir al text complet des de la Universitat Jaume I o si l'usuari compta amb subscripció. Restricted access item This item isn't open access because of publisher's policy. The full--text version is only available from Jaume I University or if the user has a running suscription to the publisher's contents.

show abstract

Enabling CUDA acceleration within virtual machines using rCUDA

Cited by 59 publications

References 13 publications

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution

FairGV: Fair and Fast GPU Virtualization

Improving the user experience of the rCUDA remote GPU virtualization framework

Contact Info

Product

Resources

About