2020
DOI: 10.48550/arxiv.2007.00279
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems

Zihan Jiang,
Lei Wang,
Xingwang Xiong
et al.

Abstract: The recent years witness a trend of applying large-scale distributed deep learning algorithms in both business and scientific computing areas, whose goal is to speed up the training time to achieve a stateof-the-art quality. The HPC community feels a great interest in building the HPC AI systems that are dedicated to running those workloads. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. None of previous HPC AI benchmarks achieve the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 54 publications
0
2
0
Order By: Relevance
“…The specification, source code, and HPC AI500 ranking numbers are publicly available from http://www.benchcouncil.org/benchhub/hpc-ai 500-benchmark. A full technical report is available from [39].…”
Section: Discussionmentioning
confidence: 99%
“…The specification, source code, and HPC AI500 ranking numbers are publicly available from http://www.benchcouncil.org/benchhub/hpc-ai 500-benchmark. A full technical report is available from [39].…”
Section: Discussionmentioning
confidence: 99%
“…The algorithm and hardware implementation, like different precision, e.g., single-precision, double-precision or mixed precision, impact the learning dynamics. Even for the same system with different scales, the interactions among system size and minibatch size significantly impact the measured quantity values like time-to-quality -the training time to achieve the state-of-the-quality -and FLOPS (the computation overhead) [25,20,47,31,38].…”
Section: The Benchmarking Challenges: Extrinsic Properties Process En...mentioning
confidence: 99%