2018
DOI: 10.1504/ijcse.2018.095847
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating attainable memory bandwidth of parallel programming models via BabelStream

Abstract: Many scientific codes consist of memory bandwidth bound kernels-the dominating factor of the runtime is the speed at which data can be loaded from memory into the Arithmetic Logic Units, before results are written back to memory. One major advantage of many-core devices such as General Purpose Graphics Processing Units (GPGPUs) and the Intel Xeon Phi is their focus on providing increased memory bandwidth over traditional CPU architectures. However, as with CPUs, this peak memory bandwidth is usually unachievab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 48 publications
(27 citation statements)
references
References 9 publications
0
27
0
Order By: Relevance
“…The number reported in the chart corresponds to the achieved bandwidth for the Triad kernel. To measure the sustained cache bandwidths, we used the methodology described in our previous work, utilizing BabelStream . The Triad kernel from the BabelStream benchmark was run in a tight loop on each core simultaneously, with problem sizes selected to ensure residency in each level of the cache.…”
Section: Benchmarking Resultsmentioning
confidence: 99%
“…The number reported in the chart corresponds to the achieved bandwidth for the Triad kernel. To measure the sustained cache bandwidths, we used the methodology described in our previous work, utilizing BabelStream . The Triad kernel from the BabelStream benchmark was run in a tight loop on each core simultaneously, with problem sizes selected to ensure residency in each level of the cache.…”
Section: Benchmarking Resultsmentioning
confidence: 99%
“…Effective peak single-precision FP performance is obtained from the SHOC benchmark [12]. Effective peak bandwidth is obtained via BabelStream [14]. Copy times are estimated at effective peak bandwidth.…”
Section: Page Re-migrationmentioning
confidence: 99%
“…The portability of programming models across a range of architectures was explored in the BabelStream benchmark [5], and we include the latest results in this paper. The performance of a number of parallel programming models on GPUs was explored using the TeaLeaf mini-app [6], showing that each model can achieve similar performance.…”
Section: A Related Workmentioning
confidence: 99%
“…For this study, we have selected five mini-apps to represent critical workloads on many of the largest supercomputers in the world: BabelStream [5], CloverLeaf [4], TeaLeaf [11], Neutral [12] and MiniFMM [13]. A short description of each mini-app will be given in Section III.…”
Section: Systematic Evaluation Of Performance Portabilitymentioning
confidence: 99%
See 1 more Smart Citation