Data Cache Prefetching Using a Global History Buffer

Nesbit, Kyle J.; Smith, Jim

doi:10.1109/mm.2005.6

Cited by 86 publications

(112 citation statements)

References 5 publications

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…For comparison with a larger L2, we quadruple the size of the L2 cache of the baseline processor to 4MB, conservatively assuming the same access latency as the base 1MB cache. We corroborate prior results [9,15] showing that GHB, an advanced stride/delta correlating predictor, can eliminate a large fraction of L2 cache misses in many applications, attaining on average 31% performance improvement across the applications we studied. Delta correlation is effective when the data layout is regular and accesses to distinct addresses follow a repeating pattern.…”

Section: Speedupsupporting

confidence: 89%

“…Table 3 compares LT-cords performance with the program counter / delta correlation variant of the Global History Buffer (GHB PC/DC, subsumes stride prefetching), a realistic DBCP implementation, and a baseline processor with a larger L2 cache. GHB uses 256-entry index and history tables, as recommended for SPEC applications [9,15]. The realistic DBCP is implemented with a 2MB on-chip correlation table as in [12].…”

Section: Speedupmentioning

confidence: 99%

“…More recently, researchers have proposed generalizing stride predictors to target miss sequences that exhibit recurring patterns of (non-constant) strides, an approach called delta correlation. The delta-correlating Global History Buffer (GHB) prefetcher [15] was recently shown t o out per f or m a var i ety of o th er har dwa re pr ef etch in g schemes [9]. Although delta correlation subsumes many previous approaches, it is not effective for data structures with irregular access patterns.…”

Section: Introductionmentioning

confidence: 99%

“…Another proposed class of prefetchers utilizes address correlation [3,4,10,11,15,20], which promises wider applicability across a diverse spectrum of workloads because they target generalized memory access patterns. Rather than detecting patterns in data layout, these prefetchers correlate data addresses to predict future misses.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Last-Touch Correlated Data Streaming

Ferdman

Falsafi

2007

2007 IEEE International Symposium on Performance Analysis of Systems &Amp; Software

View full text Add to dashboard Cite

show abstract

Section: Speedupsupporting

confidence: 89%

Section: Speedupmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Last-Touch Correlated Data Streaming

Ferdman

Falsafi

2007

2007 IEEE International Symposium on Performance Analysis of Systems &Amp; Software

View full text Add to dashboard Cite

show abstract

“…Hardware and software prefetching techniques have been studied extensively [10,33,11,25,24,31,4]. Hardware-controlled prefetchers are highly effective for applications with regular data access patterns [4]; they have been integrated into all modern high-performance processors, including Intel Core i3/i5/i7, AMD Opteron and IBM POWER, and many embedded and mobile processors, such as ARM's Cortex-A9 and Cortex-A15.…”

Section: Related Workmentioning

confidence: 99%

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

García

Rico

Villavieja

et al. 2016

Int J Parallel Prog

View full text Add to dashboard Cite

Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtimedirected prefetching and more specifically runtime-directed block prefetching.This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to.Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32% and 10% on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18% and 3% on average in energy-to-solution.

show abstract

Algorithmic Ramifications of Prefetching in Memory Hierarchy

Verma

Sen

2006

High Performance Computing - HiPC 2006

View full text Add to dashboard Cite

Abstract. External Memory models, most notable being the I-O Model [3], capture the effects of memory hierarchy and aid in algorithm design. More than a decade of architectural advancements have led to new features not captured in the I-O model -most notably the prefetching capability. We propose a relatively simple Prefetch model that incorporates data prefetching in the traditional I-O models and show how to design algorithms that can attain close to peak memory bandwidth. Unlike (the inverse of) memory latency, the memory bandwidth is much closer to the processing speed, thereby, intelligent use of prefetching can considerably mitigate the I-O bottleneck. For some fundamental problems, our algorithms attain running times approaching that of the idealized Random Access Machines under reasonable assumptions. Our work also explains the significantly superior performance of the I-O efficient algorithms in systems that support prefetching compared to ones that do not.

show abstract

Data Cache Prefetching Using a Global History Buffer

Cited by 86 publications

References 5 publications

Last-Touch Correlated Data Streaming

Last-Touch Correlated Data Streaming

Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

Algorithmic Ramifications of Prefetching in Memory Hierarchy

Contact Info

Product

Resources

About