2020 IEEE International Conference on Cluster Computing (CLUSTER) 2020
DOI: 10.1109/cluster49012.2020.00046
|View full text |Cite
|
Sign up to set email alerts
|

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Abstract: Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and allows instrumentation of TensorFlow operations. However, the current profiler only enables analysis at the TensorFlo… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(2 citation statements)
references
References 27 publications
0
1
0
1
Order By: Relevance
“…O Darshan é uma ferramenta de caracterizac ¸ão de E/S que coleta informac ¸ões de aplicac ¸ões sem afetar consideravelmente o seu desempenho. Desenvolvida no laboratório de Argonne, é muito popular em estudos que utilizam seus logs para analisar cargas de trabalho de supercomputadores e caracterizar o comportamento das operac ¸ões de E/S [Pavan et al 2019, Devarajan et al 2020, Chien et al 2020.…”
Section: Ferramenta Darshanunclassified
“…O Darshan é uma ferramenta de caracterizac ¸ão de E/S que coleta informac ¸ões de aplicac ¸ões sem afetar consideravelmente o seu desempenho. Desenvolvida no laboratório de Argonne, é muito popular em estudos que utilizam seus logs para analisar cargas de trabalho de supercomputadores e caracterizar o comportamento das operac ¸ões de E/S [Pavan et al 2019, Devarajan et al 2020, Chien et al 2020.…”
Section: Ferramenta Darshanunclassified
“…As emerging big data and machine learning applications produce and process a large amount of data, the storage performance is becoming more and more important [1], [2], [3]. To improve storage performance, flash-based solid-state drives (SSDs) have been widely adopted in both industry and academia as they provide higher bandwidth and lower latency compared with the existing hard disk drives (HDDs) [4].…”
Section: Introductionmentioning
confidence: 99%