2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) 2017
DOI: 10.1109/icpads.2017.00097
|View full text |Cite
|
Sign up to set email alerts
|

Parallel I/O Optimizations for Scalable Deep Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 22 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…For example, recent studies [27,28,30,31,35] have documented the results on the evaluation of different DL applications on different HPC systems, but all of these mainly deal with the computation characterization. There have been some mentionable efforts [23,24,41,42,53,54] on I/O profiling and optimization for DL training workloads. Among them, Zhu et al [53] have used BeeGFS as one of the baseline PFS for the comparison with the DeepIO implementation.…”
Section: Related Workmentioning
confidence: 99%
“…For example, recent studies [27,28,30,31,35] have documented the results on the evaluation of different DL applications on different HPC systems, but all of these mainly deal with the computation characterization. There have been some mentionable efforts [23,24,41,42,53,54] on I/O profiling and optimization for DL training workloads. Among them, Zhu et al [53] have used BeeGFS as one of the baseline PFS for the comparison with the DeepIO implementation.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding the issue of I/O and storage for deep learning, both the HPC and deep learning communities have, so far, dedicated most efforts to access large training datasets efficiently [10], [11], [12], [13], while leaving the problem of optimized checkpointing of learning models largely ignored. TensorFlow checkpoints model to files in its SavedModel format, 1 or in HDF5 files through Keras.…”
Section: Related Workmentioning
confidence: 99%
“…DNN model checkpointing: The problem of checkpointing DNN models efficiently is beginning to emerge in deep learning, where most efforts so far focus on efficient access of training batches [28], [29], [30], [31]. TensorFlow checkpoints model to files in its SavedModel format, 1 or in HDF5 files through Keras.…”
Section: Background and Problem Formulationmentioning
confidence: 99%