Kilo-Instruction Processors: Overcoming the Memory Wall

Cristal, Adrián; Santana, Oliverio J.; Cazorla, Francisco J.; Galluzzi, Marco; Ramirez, T.; Pericàs, Miquel; Valero, Mateo

doi:10.1109/mm.2005.53

Cited by 50 publications

(25 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, even with a perfect estimator, dual-path/multipath has less potential than DMP because (1) dual-path is applicable to one lowconfidence branch at a time (as explained previously in Section 5.1), (2) the overhead of dual-path/multipath is still much higher than that of DMP for a low-confidence branch because dual-path/multipath executes the same instructions twice/multiple times after a controlindependent point in the program. 13 These numbers are actually lower than what was previously published [17] because our baseline branch predictor uses a different algorithm and has a much higher prediction accuracy than that of [17]. Figure 15 (left) shows the average increase/reduction due to DMP in the number of fetched/executed instructions, maximum power, energy, and energy-delay product compared to the baseline.…”

Section: Effect Of a Different Branch Predictormentioning

confidence: 82%

“…Our baseline employs a 2KB enhanced JRS confidence estimator [19], which has 14% PVN ( accuracy) and 70% SPEC ( coverage) [17]. 13 Even with a 512-byte estimator, DMP still provides 18.4% performance improvement. The benefit of dual-path/multipath increases significantly with a perfect estimator because dual-path/multipath has very high overhead as shown in Figure 7, and a perfect confidence estimator eliminates the incurrence of this large overhead for correctly-predicted branches.…”

Section: Effect Of a Different Branch Predictormentioning

confidence: 99%

“…In the near future, processors are expected to support a large number of in-flight instructions [30,42,10,7,13] to extract both ILP and memory-level parallelism (MLP). As shown by previous research [27,40,41,30,42], the performance improvement provided by both pipelining and large instruction windows critically depends on the accuracy of the processor's branch predictor.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Kim

Joao

Mutlu

et al. 2006

2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

View full text Add to dashboard Cite

show abstract

Section: Effect Of a Different Branch Predictormentioning

confidence: 82%

Section: Effect Of a Different Branch Predictormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Kim

Joao

Mutlu

et al. 2006

2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, Cristal et al were the first to suggest that the ability to support many in-flight instructions is very effective in overcoming the memory wall [22]. Unfortunately, they thought (as did many others) that a simple implementation, based on enlarging the window resources, was impractical because of the delay, area, and power overheads.…”

Section: Large Instruction Windowmentioning

confidence: 99%

MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

Kora

Yamaguchi

Ando

2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYSingle-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).

show abstract

“…If a last-level cache (LLC) miss occurs, the processor stalls waiting for the requested data to be retrieved from main memory, the latency of which is hundreds of clock cycles. To mitigate against this, aggressive out-of-order execution is an effective approach [1]. In this approach, the instruction window, composed of the reorder buffer (ROB), the issue queue (IQ), and the load/store queue (LSQ), is extensively enlarged and instructions are aggressively reordered at issue time.…”

Section: Introductionmentioning

confidence: 99%

Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

Ando

Shioya

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Hideki ANDO†a) and Ryota SHIOYA †b) , Members SUMMARY Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memoryintensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.

show abstract

Kilo-Instruction Processors: Overcoming the Memory Wall

Cited by 50 publications

References 12 publications

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

Contact Info

Product

Resources

About