2013 IEEE International Conference on Cluster Computing (CLUSTER) 2013
DOI: 10.1109/cluster.2013.6702662
|View full text |Cite
|
Sign up to set email alerts
|

Influence of InfiniBand FDR on the performance of remote GPU virtualization

Abstract: Abstract-The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream today, but their adoption in current high-performance computing clusters is impaired primarily by acquisition costs and power consumption. Therefore, the benefits of sharing a reduced number of GPUs among all the nodes of a cluster can be remarkable for many applications. This approach, usually referred to as remote GPU virtualization, aims at reducing the number of GPUs present in a cluster, while incr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 20 publications
(23 citation statements)
references
References 17 publications
0
23
0
Order By: Relevance
“…So the predominant communication mode between a compute node and composable GPU and FPGA resources is likely through bulk data transfer. It has been shown by [37] that adequate bandwidth such as those offered by RDMA at FDR data rate (56 Gb/s) already demonstrated superior performance than a locally connected GPU.…”
Section: Software Stackmentioning
confidence: 99%
“…So the predominant communication mode between a compute node and composable GPU and FPGA resources is likely through bulk data transfer. It has been shown by [37] that adequate bandwidth such as those offered by RDMA at FDR data rate (56 Gb/s) already demonstrated superior performance than a locally connected GPU.…”
Section: Software Stackmentioning
confidence: 99%
“…Although remote GPU virtualization has demonstrated very low overhead with respect to a configuration with a local GPU [22], due to its novelty, this technology is not yet supported by the job schedulers that are commonly encountered in production clusters (e.g., SLURM [23], PBSPro [24], MOAB [25], TORQUE [26], LSF [27], OAR [28], MAUI [29], LoadLever [30], Condor [31], and Sun Grid Engine [32]). In particular, a common job scheduler in production today only deals with real GPUs so that, when a job requests a number of nodes equipped with one (or more) GPU(s), the scheduler will try to map that job to nodes that actually own the requested number of GPUs, thus impairing the benefits of GPU virtualization.…”
Section: Introductionmentioning
confidence: 99%
“…Remote GPU virtualization techniques can help increase GPU utilization rates, while reducing acquisition and maintenance costs. For these reasons, many different virtualization solutions are available today, such as rCUDA [4], [5], SnuCL [6], GVirtuS [7], DS-CUDA [8], and VOCL [9].…”
Section: Introductionmentioning
confidence: 99%