2019
DOI: 10.48550/arxiv.1901.00041
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamic Space-Time Scheduling for GPU Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 0 publications
0
11
0
Order By: Relevance
“…This approach is set to improve on the low utilization and scaling performances of unshared access to a GPU. That idea of GPU sharing can be promising as seen in [13], where authors studied the performance of temporal and spacial GPU sharing and [14], which presented a GPU cluster manager enabling GPU sharing for DL jobs.…”
Section: G Discussion On Resource Sharingmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach is set to improve on the low utilization and scaling performances of unshared access to a GPU. That idea of GPU sharing can be promising as seen in [13], where authors studied the performance of temporal and spacial GPU sharing and [14], which presented a GPU cluster manager enabling GPU sharing for DL jobs.…”
Section: G Discussion On Resource Sharingmentioning
confidence: 99%
“…where α denotes weight of the average latency in the objective function. The first constraint in the problem (13) insures that a request from a specific IoT node can be processed only by one edge node. Constraint (4) assures that RTT cannot exceed the maximum tolerated latency.…”
Section: E Problem Formulationmentioning
confidence: 99%
“…However, this approach does not consider the microarchitectural interactions of the NVIDIA scheduling hierarchy such as the thread block scheduler which, as we have demonstrated, impact the performance of concurrent workloads. Preliminary work conducted by Jain et al on deep learning inference-only workloads suggests that combining spatial and temporal multitasking may outperform both in isolation [12]. We discussed this possibility further in Section 5.…”
Section: Related Workmentioning
confidence: 96%
“…We observed that such workloads have fluctuating resource requirements, variable kernel runtimes, and sequential kernel launches, and unpredictable arrival times. Previously proposed thread-block-level scheduling policies [2,12,20,25,28,29] focus only on more generic workloads which do not possess such characteristics. Finally, we add to previous understandings of the CUDA scheduling hierarchy and its concurrency mechanisms [3,6,16,23].…”
Section: Introductionmentioning
confidence: 99%
“…The typical usage of the GPU as a single resource leads to an under-utilization of its computing power. This has been shown in many works in the Neural Network literature [7] and in different well-known benchmarks [12] such as Parboil [13] or Rodinia [4]. Resource under-utilization and unpredictability are further exacerbated as the number of available computing clusters in a GPU increases.…”
Section: Introductionmentioning
confidence: 95%