2010 IEEE International Symposium on Performance Analysis of Systems &Amp; Software (ISPASS) 2010
DOI: 10.1109/ispass.2010.5452069
|View full text |Cite
|
Sign up to set email alerts
|

StatStack: Efficient modeling of LRU caches

Abstract: The identification of the memory gap in terms of the relatively slow memory accesses put a focus on cache performance in the 90s. The introduction of the moderately clocked multicores has shifted this focus from memory latency to memory bandwidth for modern processors. The multicore's limited cache capacity per thread in combination with their current a projected off-chip memory bandwidth limitation makes this the most likely bottleneck of future computer systems.This paper presents a new and efficient way of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0
5

Year Published

2011
2011
2023
2023

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 112 publications
(74 citation statements)
references
References 29 publications
0
69
0
5
Order By: Relevance
“…This information is provided as input to the runtime and can be generated with help of a profiling pass. There is a plethora of prior works that use various fast profiling methods to identify which memory instructions miss in the cache hierarchy, such as [1,5,13,23]. Those memory instructions can be targeted for software prefetching as shown by [8,13,23].…”
Section: Inserting Software Prefetchesmentioning
confidence: 99%
See 1 more Smart Citation
“…This information is provided as input to the runtime and can be generated with help of a profiling pass. There is a plethora of prior works that use various fast profiling methods to identify which memory instructions miss in the cache hierarchy, such as [1,5,13,23]. Those memory instructions can be targeted for software prefetching as shown by [8,13,23].…”
Section: Inserting Software Prefetchesmentioning
confidence: 99%
“…However, in this work we have focused on reuse-distance based methods to model prefetches as they can be targeted to enable improved use of shared resources. Reuse-distance based models such as [1,5] can model miss ratios for individual memory instructions. This information can be used to decide which memory instructions should be targeted for software prefetching.…”
Section: Inserting Software Prefetchesmentioning
confidence: 99%
“…These information can be automatically generated or retrieved from a real application. The number of instructions, the memory access rate and the stack distance profile can be generated using tools such as an extension to CacheGrind (Babka et al, 2012), StatStack (Eklov and Hagersten, 2010) or MICA 2 (Hoste and Eeckhout, 2007). The base CPI requires a cycle accurate simulator.…”
Section: Memory Behavior Of a Taskmentioning
confidence: 99%
“…MRCs capture an application's cache miss ratio as a function of the cache space available to the applications. MRCs can be generated fairly cheaply [9,6,8], and have been used in contexts such as cache partitioning [14], off-chip bandwidth partitioning [11] and cache contention modeling [7]. However, while MRCs provide significant insight into the miss ratios and data locality of applications, they are limited in their ability to predict performance.…”
Section: Miss Ratio Curvesmentioning
confidence: 99%
“…As long as the the sampling covers all phases of the application fairly, this would allow accurate data collection with a further reduction of the overhead. Such approaches have been used to speed up simulation [16] and stack distance collection [6]. However, we have not implemented this approach.…”
Section: Dynamically Varying the Pirate Sizementioning
confidence: 99%