2005
DOI: 10.1109/mm.2005.53
|View full text |Cite
|
Sign up to set email alerts
|

Kilo-Instruction Processors: Overcoming the Memory Wall

Abstract: Historically, advances in integrated circuit technology have driven improvements in processor microarchitecture and led to today's microprocessors with sophisticated pipelines operating at very high clock frequencies. However, performance improvements achievable by high-frequency microprocessors have become seriously limited by main-memory access latencies because main-memory speeds have improved at a much slower pace than microprocessor speeds. It's crucial to deal with this performance disparity, commonly kn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2006
2006
2016
2016

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(25 citation statements)
references
References 12 publications
0
25
0
Order By: Relevance
“…However, even with a perfect estimator, dual-path/multipath has less potential than DMP because (1) dual-path is applicable to one lowconfidence branch at a time (as explained previously in Section 5.1), (2) the overhead of dual-path/multipath is still much higher than that of DMP for a low-confidence branch because dual-path/multipath executes the same instructions twice/multiple times after a controlindependent point in the program. 13 These numbers are actually lower than what was previously published [17] because our baseline branch predictor uses a different algorithm and has a much higher prediction accuracy than that of [17]. Figure 15 (left) shows the average increase/reduction due to DMP in the number of fetched/executed instructions, maximum power, energy, and energy-delay product compared to the baseline.…”
Section: Effect Of a Different Branch Predictormentioning
confidence: 82%
See 2 more Smart Citations
“…However, even with a perfect estimator, dual-path/multipath has less potential than DMP because (1) dual-path is applicable to one lowconfidence branch at a time (as explained previously in Section 5.1), (2) the overhead of dual-path/multipath is still much higher than that of DMP for a low-confidence branch because dual-path/multipath executes the same instructions twice/multiple times after a controlindependent point in the program. 13 These numbers are actually lower than what was previously published [17] because our baseline branch predictor uses a different algorithm and has a much higher prediction accuracy than that of [17]. Figure 15 (left) shows the average increase/reduction due to DMP in the number of fetched/executed instructions, maximum power, energy, and energy-delay product compared to the baseline.…”
Section: Effect Of a Different Branch Predictormentioning
confidence: 82%
“…Our baseline employs a 2KB enhanced JRS confidence estimator [19], which has 14% PVN ( accuracy) and 70% SPEC ( coverage) [17]. 13 Even with a 512-byte estimator, DMP still provides 18.4% performance improvement. The benefit of dual-path/multipath increases significantly with a perfect estimator because dual-path/multipath has very high overhead as shown in Figure 7, and a perfect confidence estimator eliminates the incurrence of this large overhead for correctly-predicted branches.…”
Section: Effect Of a Different Branch Predictormentioning
confidence: 99%
See 1 more Smart Citation
“…To the best of our knowledge, Cristal et al were the first to suggest that the ability to support many in-flight instructions is very effective in overcoming the memory wall [22]. Unfortunately, they thought (as did many others) that a simple implementation, based on enlarging the window resources, was impractical because of the delay, area, and power overheads.…”
Section: Large Instruction Windowmentioning
confidence: 99%
“…If a last-level cache (LLC) miss occurs, the processor stalls waiting for the requested data to be retrieved from main memory, the latency of which is hundreds of clock cycles. To mitigate against this, aggressive out-of-order execution is an effective approach [1]. In this approach, the instruction window, composed of the reorder buffer (ROB), the issue queue (IQ), and the load/store queue (LSQ), is extensively enlarged and instructions are aggressively reordered at issue time.…”
Section: Introductionmentioning
confidence: 99%