2016
DOI: 10.1007/978-3-319-46079-6_17
|View full text |Cite
|
Sign up to set email alerts
|

High Performance Computing on the IBM Power8 Platform

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…This is only about 4% and 3.2% of the DGEMM peak of the two processors on each system (623.19 GFlops/sec and 482.53 GFlops/sec respectively). For the two socket POWER8 the most time consuming kernel achieved about 52 GFlops/s which is about 10% of peak (501 GFlops [51]). Such a low computation performance achieved of the peak is more prominent on the P100 and V100 GPUs with only less than 3% achieved (out of 4.7TFlops/sec and 7TFlops/sec respectively) on either of the GPUs for the most time consuming kernel.…”
Section: Computation and Bandwidth Performancementioning
confidence: 99%
“…This is only about 4% and 3.2% of the DGEMM peak of the two processors on each system (623.19 GFlops/sec and 482.53 GFlops/sec respectively). For the two socket POWER8 the most time consuming kernel achieved about 52 GFlops/s which is about 10% of peak (501 GFlops [51]). Such a low computation performance achieved of the peak is more prominent on the P100 and V100 GPUs with only less than 3% achieved (out of 4.7TFlops/sec and 7TFlops/sec respectively) on either of the GPUs for the most time consuming kernel.…”
Section: Computation and Bandwidth Performancementioning
confidence: 99%
“…[19] compare OpenMP 4.5 with Cray to Ope-nACC on Nekbone, however the analysis here is also restricted to runtimes, the focus is more on programmability. We are not aware of academic papers studying the performance of CUDA Fortran or OpenMP 4 in the IBM XL compilers aside from early results in our own previous work [20]. There is also very little work on comparing the performance of CUDA code compiled with nvcc and clang.…”
Section: Related Workmentioning
confidence: 99%
“…The HPC system software for alternative platforms is still under development; for example, the first math libraries for ARM-based servers were released three years ago [93]. Similar studies confirm that system software stack on alternative platforms is relatively immature, which limits the achievable performance [88,94,95]. Finally, ThunderX shows very low FLOPS and memory utilization of 23% and 27%, respectively.…”
Section: Theoretical Vs Sustained Flops/s and Memory Bandwidthmentioning
confidence: 97%