2005
DOI: 10.1109/mm.2005.6
|View full text |Cite
|
Sign up to set email alerts
|

Data Cache Prefetching Using a Global History Buffer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
111
0

Year Published

2006
2006
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 86 publications
(112 citation statements)
references
References 5 publications
1
111
0
Order By: Relevance
“…For comparison with a larger L2, we quadruple the size of the L2 cache of the baseline processor to 4MB, conservatively assuming the same access latency as the base 1MB cache. We corroborate prior results [9,15] showing that GHB, an advanced stride/delta correlating predictor, can eliminate a large fraction of L2 cache misses in many applications, attaining on average 31% performance improvement across the applications we studied. Delta correlation is effective when the data layout is regular and accesses to distinct addresses follow a repeating pattern.…”
Section: Speedupsupporting
confidence: 89%
See 3 more Smart Citations
“…For comparison with a larger L2, we quadruple the size of the L2 cache of the baseline processor to 4MB, conservatively assuming the same access latency as the base 1MB cache. We corroborate prior results [9,15] showing that GHB, an advanced stride/delta correlating predictor, can eliminate a large fraction of L2 cache misses in many applications, attaining on average 31% performance improvement across the applications we studied. Delta correlation is effective when the data layout is regular and accesses to distinct addresses follow a repeating pattern.…”
Section: Speedupsupporting
confidence: 89%
“…Table 3 compares LT-cords performance with the program counter / delta correlation variant of the Global History Buffer (GHB PC/DC, subsumes stride prefetching), a realistic DBCP implementation, and a baseline processor with a larger L2 cache. GHB uses 256-entry index and history tables, as recommended for SPEC applications [9,15]. The realistic DBCP is implemented with a 2MB on-chip correlation table as in [12].…”
Section: Speedupmentioning
confidence: 99%
See 2 more Smart Citations
“…Hardware and software prefetching techniques have been studied extensively [10,33,11,25,24,31,4]. Hardware-controlled prefetchers are highly effective for applications with regular data access patterns [4]; they have been integrated into all modern high-performance processors, including Intel Core i3/i5/i7, AMD Opteron and IBM POWER, and many embedded and mobile processors, such as ARM's Cortex-A9 and Cortex-A15.…”
Section: Related Workmentioning
confidence: 99%