2010
DOI: 10.1007/978-3-642-14390-8_64
|View full text |Cite
|
Sign up to set email alerts
|

Introducing a Performance Model for Bandwidth-Limited Loop Kernels

Abstract: We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 49 publications
(67 citation statements)
references
References 1 publication
0
67
0
Order By: Relevance
“…The consequence is that single core performance is not quantitatively accurate for memory-bound kernels (it is too optimistic). Hence, in particular, multicore scalability can not be inferred from this model [3,8]. Moreover, the GFlop/s metric which is generally used with Roofline models has some drawbacks from a practical methodological point of view.…”
Section: The Roofline Modelmentioning
confidence: 99%
See 3 more Smart Citations
“…The consequence is that single core performance is not quantitatively accurate for memory-bound kernels (it is too optimistic). Hence, in particular, multicore scalability can not be inferred from this model [3,8]. Moreover, the GFlop/s metric which is generally used with Roofline models has some drawbacks from a practical methodological point of view.…”
Section: The Roofline Modelmentioning
confidence: 99%
“…The ECM model ( [3], [8]) is a refinement of the Roofline model for multicore CPUs that still neglects any latency effects, but takes into account the cache hierarchy. It uses the cycles per cacheline worth of data (cy/CL) performance metric.…”
Section: The Ecm Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…The roofline model can predict the performance of a simple von Neumann architecture with two levels of memory as well as the more complex design with a multi-level memory hierarchy. It has been successfully used to model the performance of many applications on the multi-core and many-core processors [22]. Recently, it has been extended to model the energy consumption in GPUs [23].…”
Section: Related Workmentioning
confidence: 99%