2016
DOI: 10.1002/cpe.3921
|View full text |Cite
|
Sign up to set email alerts
|

Performance analysis of the Kahan‐enhanced scalar product on current multi‐core and many‐core processors

Abstract: We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient single instruction multiple data-vectorized implementations on recent multi-core and many-core processors. Using low-level instruction analysis and the execution-cache-memory performance model, we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution and predict performance and sat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
10
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…A full introduction to the ECM model would exceed the scope of this paper, so we refer to the references given above. The model has been shown to work well for the analysis of implementations of several important computational kernels [19,18,23,2,9].…”
Section: Performance On Modern Multicore Cpusmentioning
confidence: 99%
See 2 more Smart Citations
“…A full introduction to the ECM model would exceed the scope of this paper, so we refer to the references given above. The model has been shown to work well for the analysis of implementations of several important computational kernels [19,18,23,2,9].…”
Section: Performance On Modern Multicore Cpusmentioning
confidence: 99%
“…Instead, time contributions from in-core execution and data transfers through the memory hierarchy are calculated and then put together according to the properties of a particular processor architecture; for instance, in Intel x86 server CPUs all time contributions from data transfers including LOADs and STOREs in the L1 cache must be added to get a prediction of single-core data transfer time [18,8]. On the other hand, the IBM Power8 processor shows almost perfect overlap [9]. A full introduction to the ECM model would exceed the scope of this paper, so we refer to the references given above.…”
Section: Performance On Modern Multicore Cpusmentioning
confidence: 99%
See 1 more Smart Citation
“…the single-core model, are presented here. Readers interested in the full ECM model can find the most recent version for Intel Xeon, Intel Xeon Phi, and IBM POWER8 processors here [6].…”
Section: The Ecm Performance Modelmentioning
confidence: 99%
“…Paper investigates the performance characteristics of a numerically enhanced scalar product kernel loop that uses the Kahan algorithm to compensate for numerical errors. By applying the model‐guided performance engineering, the author's approach enables the system to detect the relevant performance bottlenecks for single‐core and thread‐parallel executions and to predict performance and saturation behavior.…”
mentioning
confidence: 99%