Combining trace sampling with single pass methods for efficient cache simulation

Conte, Thomas M.; Hirsch, M.A.; Hwu, Wen mei

doi:10.1109/12.689650

Cited by 52 publications

(37 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Much of the accelerated simulator literature falls under this category, only differing in which portions of the code are selected and how to fastforward to those specific points in the program [12,19,27]. Several techniques have been proposed to warmup cache state prior to measurement in order to obtain more accurate simulation results [1,4,5,10,11,27]. The MTR adds multiprocessor and directory support to these techniques.…”

Section: Related Workmentioning

confidence: 99%

Accelerating Multiprocessor Simulation with a Memory Timestamp Record

Barr¹,

Pan²,

Zhang³

et al. 2005

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Accelerating Multiprocessor Simulation with a Memory Timestamp Record

Barr¹,

Pan²,

Zhang³

et al. 2005

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.

View full text Add to dashboard Cite

show abstract

“…Armed with equation (6), we can compute the expected stack distance distribution of an application given its RDS as follows: First, we compute the expected stack distance of each distinct reuse distance in the RDS using (6). Then, by weighting each of the expected stack distances with the frequency of the corresponding reuse distance in the RDS we get the expected stack distance distribution.…”

Section: Cache Modelmentioning

confidence: 99%

StatStack: Efficient modeling of LRU caches

Eklov

Hägersten

2010

2010 IEEE International Symposium on Performance Analysis of Systems &Amp; Software (ISPASS)

112

View full text Add to dashboard Cite

The identification of the memory gap in terms of the relatively slow memory accesses put a focus on cache performance in the 90s. The introduction of the moderately clocked multicores has shifted this focus from memory latency to memory bandwidth for modern processors. The multicore's limited cache capacity per thread in combination with their current a projected off-chip memory bandwidth limitation makes this the most likely bottleneck of future computer systems.This paper presents a new and efficient way of estimating the cache performance for an application. The method has several similarities with that of Stack Distance, but instead of counting unique memory objects, as is done for Stack Distance calculations, our schema only requires the number of memory accesses to be counted between two successive accesses to the same data object. This task can be efficiently handled at runtime by existing built-in hardware counters. Furthermore, only a small fraction of the memory accesses have to be monitored for an accurate estimation.We show how low-overhead runtime data, similar to that of StatCache, is sufficient to feed this model. We evaluate the accuracy of the proposed transformation based on sparse data and compare the results with that of native stack distance based all memory accesses. We show excellent accuracy over a wide range of cache sizes and applications.

show abstract

“…In fact, most of the prior research on the cold-start problem has been done on cache warmup. Various approaches have been proposed such as no warmup, stale state (also called stitch) [13], fixed warmup [1], cache miss rate estimators [14], no-state-loss [12,15], minimal subset evaluation (MSE) [16], memory reference reuse latency (MRRL) [17], boundary line reuse latency (BLRL) [8,18], self-monitored adaptive cache warmup (SMA) [19], memory hierarchy state (MHS) [5], memory timestamp record (MRT) [7], etc.…”

Section: Cache Warmupmentioning

confidence: 99%