2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2018
DOI: 10.1109/mascots.2018.00023
|View full text |Cite
|
Sign up to set email alerts
|

Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 57 publications
(40 citation statements)
references
References 15 publications
0
40
0
Order By: Relevance
“…While reading a huge dataset, read requests to physical backend devices may frequently happen, since the dataset cannot fit entirely in the PFS's cache. These frequent I/O requests to read all the data from a large dataset at each epoch lead to relatively slower I/O performance than that for smaller datasets [24,53].…”
Section: Dataset Sizementioning
confidence: 99%
See 1 more Smart Citation
“…While reading a huge dataset, read requests to physical backend devices may frequently happen, since the dataset cannot fit entirely in the PFS's cache. These frequent I/O requests to read all the data from a large dataset at each epoch lead to relatively slower I/O performance than that for smaller datasets [24,53].…”
Section: Dataset Sizementioning
confidence: 99%
“…Although some frameworks (e.g., TensorFlow) support local shuffling after sequentially reading a few elements from a batched file, randomly reading small raw images is a general practice to ensure the randomization of an input sequence. These massive small random reads impose non-trivial performance loss compared to sequential reads of large batched files [24,53].…”
Section: Random Filementioning
confidence: 99%
“…Regarding the issue of I/O and storage for deep learning, both the HPC and deep learning communities have, so far, dedicated most efforts to access large training datasets efficiently [10], [11], [12], [13], while leaving the problem of optimized checkpointing of learning models largely ignored. TensorFlow checkpoints model to files in its SavedModel format, 1 or in HDF5 files through Keras.…”
Section: Related Workmentioning
confidence: 99%
“…DNN model checkpointing: The problem of checkpointing DNN models efficiently is beginning to emerge in deep learning, where most efforts so far focus on efficient access of training batches [28], [29], [30], [31]. TensorFlow checkpoints model to files in its SavedModel format, 1 or in HDF5 files through Keras.…”
Section: Background and Problem Formulationmentioning
confidence: 99%