2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019
DOI: 10.1109/ipdps.2019.00029
|View full text |Cite
|
Sign up to set email alerts
|

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

Abstract: Training neural network (NN) often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in NN training are typically implemented by the frameworks as primitives and represented as nodes in the dataflow graph. Training NN models in a dataflow-based machine learning framework involves a large number of fine-grained operations whcih present diverse memory access patterns… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
9
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 16 publications
1
9
0
Order By: Relevance
“…This indicates that the computation across time steps remains stable and hence is highly predictable. This observation is consistent with the existing work that leverages predictability of deep learning workloads for performance optimization [50], [51].…”
Section: B Performance Analysissupporting
confidence: 91%
See 2 more Smart Citations
“…This indicates that the computation across time steps remains stable and hence is highly predictable. This observation is consistent with the existing work that leverages predictability of deep learning workloads for performance optimization [50], [51].…”
Section: B Performance Analysissupporting
confidence: 91%
“…Such predictability allows us to apply dynamic profiling on a few training steps to collect workload characterization, based on which we guide operation scheduling and power management in the future training steps. Predictability of execution time during the training has been leveraged in the existing work [50], [51]. We expect to leverage the predictability of other characterization in the future work.…”
Section: Discussion and Future Research Directionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite concentrating on GPUs, Liu et al [19] proposed a lightweight machine learning based performance model to choose the number of threads to use for the parallelization of the training of a neural network (NN). They chose iii to use non-deterministic features collected by hardware counters, namely, the number of CPU cycles, the numbers of cache misses, the accesses for the last cache level, and the number of level 1 cache hits.…”
Section: Related Contributionsmentioning
confidence: 99%
“…In this paper, we propose using machine learning to directly predict the optimal chunk-size to achieve the best performance instead of predicting the execution time. Also, we do not attempt to find the optimal number of cores to run an application on like in [19]. In our research, it is assumed that the user is working on a given number of cores and simply want to find the optimal way to share the workload between these cores.…”
Section: Related Contributionsmentioning
confidence: 99%