2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA) 2014
DOI: 10.1109/isca.2014.6853208
|View full text |Cite
|
Sign up to set email alerts
|

Enabling preemptive multiprogramming on GPUs

Abstract: GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service.In this paper, we propose a set of hardware exten… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
63
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 119 publications
(63 citation statements)
references
References 22 publications
0
63
0
Order By: Relevance
“…Moreover, each SM has its own on-chip scratch-pad memory, which is shared by the threads within a thread block. For modern GPUs, the context of a single SM can be as large as 256kB of register file and 48kB of shared memory [1,24,29]. With such a large context, preempting with context switching has high overhead in both preemption latency and wasted throughput.…”
Section: Prior Preemption Techniquesmentioning
confidence: 99%
See 3 more Smart Citations
“…Moreover, each SM has its own on-chip scratch-pad memory, which is shared by the threads within a thread block. For modern GPUs, the context of a single SM can be as large as 256kB of register file and 48kB of shared memory [1,24,29]. With such a large context, preempting with context switching has high overhead in both preemption latency and wasted throughput.…”
Section: Prior Preemption Techniquesmentioning
confidence: 99%
“…However, supporting preemptive multitasking on GPUs through context switching can incur a higher overhead compared to CPUs, where the context of an SM can be as large as 256kB of register file and 48kB of on-chip scratch-pad memory [1,24,29]. Not only does a kernel have to endure a long preemption latency, the GPU also wastes execution resources while context switching.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…As many applications do not require full GPU resources, spatial multitasking can improve total system throughput with concurrent execution of multiple applications compared to temporal multitasking [12,14].…”
Section: Introductionmentioning
confidence: 99%