2020
DOI: 10.1007/s10586-019-03037-6
|View full text |Cite
|
Sign up to set email alerts
|

Interference-aware parallelization for deep learning workload in GPU cluster

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…Thus, GPUs can process a large number of data points in parallel which leads to higher computational throughput. Training deep learning models is computationally expensive and time consuming process due to the need for a tremendous volume of data, and leveraging scalable computation resources can speed up the training process significantly [15,24]. Recently, research effort has been focused on speeding up the training process.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, GPUs can process a large number of data points in parallel which leads to higher computational throughput. Training deep learning models is computationally expensive and time consuming process due to the need for a tremendous volume of data, and leveraging scalable computation resources can speed up the training process significantly [15,24]. Recently, research effort has been focused on speeding up the training process.…”
Section: Introductionmentioning
confidence: 99%
“…Typically there is no significant performance degradation until the overall CPU utilization in a server approaches the number of physical cores. The reason is that in the architecture of a physical CPU, L1/L2 cache is isolated for each physical core [9]. The contention of cache occurs when demands exceed the number of physical cores, which is common since one physical core is typically virtualized into two logical cores for DL developers (through Hyper Thread) [19].…”
Section: B Modelmentioning
confidence: 99%
“…Some white-box studies build explicit interference models to predict performance slowdown of co-locations, e.g., for MapReduce tasks [7], VM tasks [8] with I/O contention, etc. Other DNN-based approaches use a large amount of historical trace to learn interference levels of co-located ML jobs [9], or equip the scheduler with a reinforcement learning (RL) model to improve job placement policy through explorations and feedback [10] [11].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, more paralleled deep learning methods have been brought up [11]. In the aspect of algorithms, several algorithms have been brought up to accelerate multi-GPU implementation or make the inference more accurate [1,26] and faster [7,12]. Moreover, there are researches have been done to integrate DP and MP [8].…”
Section: Multi-gpu Parallel Computingmentioning
confidence: 99%