2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) 2017
DOI: 10.1109/icdcs.2017.259
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Deep Learning Frameworks Over Different HPC Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 59 publications
(25 citation statements)
references
References 8 publications
0
25
0
Order By: Relevance
“…Finally, employing NVLink (either P2P or CL) introduces additional overhead (e.g., enable/disable peer access, routing, NCCL initialization, etc). To summarize, a faster GPU interconnect such as NVLink has been reported to be beneficial for accelerating modern deep-learning frameworks [31], [32]. However, regarding general GPGPU applications, without (i) replacing the underlying CPU-centric master-slave programming model by a more distributed parallelization model, or (ii) migrating the communication master role to a GPU (e.g., offloading GPU communication control from CPU to GPU via techniques such as NVSHMEM [33]), optimized inter-GPU communication via faster intra-node GPU interconnect such as NVLinks can hardly become significant enough to lift the entire application's speedup.…”
Section: Impact Of Nvlinkmentioning
confidence: 99%
“…Finally, employing NVLink (either P2P or CL) introduces additional overhead (e.g., enable/disable peer access, routing, NCCL initialization, etc). To summarize, a faster GPU interconnect such as NVLink has been reported to be beneficial for accelerating modern deep-learning frameworks [31], [32]. However, regarding general GPGPU applications, without (i) replacing the underlying CPU-centric master-slave programming model by a more distributed parallelization model, or (ii) migrating the communication master role to a GPU (e.g., offloading GPU communication control from CPU to GPU via techniques such as NVSHMEM [33]), optimized inter-GPU communication via faster intra-node GPU interconnect such as NVLinks can hardly become significant enough to lift the entire application's speedup.…”
Section: Impact Of Nvlinkmentioning
confidence: 99%
“…Shams et al in [14] already performed a study of HPC clusters with machine learning framework. However, they include accelerators (KNL and GPU), and they do not include considerations about the effects of optimized linear algebra routines.…”
Section: Related and Future Workmentioning
confidence: 99%
“…In addition, TensorFlow [24] and Caffe [25] can also be used for distributed training. The performance of distributed training frameworks has been studied on the high-performance computing (HPC) architecture, in a previous study [26]. This study analyzed the training time of Caffe, TensorFlow, and SINGA with respect to the CPU and GPU configurations in HPC, using an Intel Xeon and IBM Power 8 Processor.…”
Section: Related Workmentioning
confidence: 99%