Improving quasi-dynamic schedules through region slip

Spadini, Francesco; Fahs, Brian; Patel, Sanjay J.; Lumetta, Steven S.

doi:10.1109/cgo.2003.1191541

Cited by 8 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since there are multiple queues, instructions from different LRR blocks can be overlapped (Figure 4). Instructions in each basic block are therefore issued in their statically scheduled order but are overlapped with instructions from successive basic blocks (Note, the LR queue is conceptually similar to the region-slip-enabled issue buffer proposed by Spadini et al [26]. Their proposed mechanism uses a FIFO-based issue buffer that allows a block's schedule to 'slip' into the schedule of a previous block.…”

Section: Reorder-sensitive Issue Logicmentioning

confidence: 98%

Low-power, low-complexity instruction issue using compiler assistance

Valluri

John

McKinley

2005

Proceedings of the 19th Annual International Conference on Supercomputing

View full text Add to dashboard Cite

In an out-of-order issue processor, instructions are dynamically reordered and issued to function units in their dataready order rather than their original program order to achieve high performance. The logic that facilitates dynamic issue is one of the most power-hungry and time-critical components in a typical out-of-order issue processor.This paper develops a cooperative hardware/software technique to reduce complexity and energy consumption of the issue logic. The proposed scheme is based on the observation that not all instructions in a program require the same amount of dynamic reordering. Instructions that belong to basic blocks for which the compiler can perform near-optimal sche-duling do not need any intra-block instruction reordering but require only inter-block instruction overlap. In contrast, blocks where the compiler is limited by artificial dependences and memory misses require both intra-block and inter-block instruction reordering. The proposed ReorderSensitive Issue Scheme utilizes a novel compile-time analyzer to evaluate the quality of schedules generated by the static scheduler and to estimate the dynamic reordering requirement of instructions within each basic block. At the micro-architecture-level, we propose a novel issue queue that exploits the varying dynamic scheduling requirement of basic blocks to lower the power dissipation and complexity of the dynamic issue hardware.An evaluation of the technique on several SPEC integer benchmarks indicates that we can reduce the energy consumption in the issue queue on average by 72% with only 5% performance degradation. Additionally, the proposed issue hardware is significantly less complex when compared to a conventional monolithic out-of-order issue queue, providing the potential for high clock speeds.

show abstract

Section: Reorder-sensitive Issue Logicmentioning

confidence: 98%

Low-power, low-complexity instruction issue using compiler assistance

Valluri

John

McKinley

2005

Proceedings of the 19th Annual International Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…The out-of-order pipeline schedules and renames the instructions. However, there is an in-order rePLay [16], where the instructions in a trace are scheduled and renamed after optimization, and executed in an in-order pipeline. It uses a conventional register file, and a trace needs to record its live-in and live-out registers.…”

Section: A Caching Proposalsmentioning

confidence: 99%

Reusing cached schedules in an out-of-order processor with in-order issue logic

Palomar

JuanJ

Navarro

2009

2009 IEEE International Conference on Computer Design

View full text Add to dashboard Cite

Abstract-The complex and powerful out-of-order issue logic dismisses the repetitive nature of the code, unlike what caches or branch predictors do. We show that 90% of the cycles, the group of instructions selected by the issue logic belongs to just 13% of the total different groups issued: the issue logic of an out-of-order processor is constantly re-discovering what it has already found. To benefit from the repetitive nature of instruction issue, we move the scheduling logic after the commit stage, out of the critical path of execution. The schedules created there are cached and reused to feed a simple in-order issue logic, that could result in a higher frequency design. We present the complete design of our ReLaSch processor, that achieves the same average IPC than a conventional out-of-order processor, and a 1.56 speed-up over the IPC of an in-order processor. We actually surpass the out-of-order IPC in 23 out of 40 SPEC benchmarks, mainly because the broader vision of the code after the commit stage allows creating better schedules.

show abstract

“…We believe the DTSVLIW architecture to be simpler than that of DIF and easier to implement. Work on the DIF appears to have ceased, while it is not clear how rePLay could be extended to multi-threading: the rePLay paper [9] states that the scheduler can be hardware or software based, and the hardware scheduler takes 10 clock cycles for each instruction scheduled, which indicates that the scheduler does not lie in the main execution path. In the DTSVLIW, scheduling occurs in a scalar mode of execution with the scheduler designed to process 1 instruction per cycle, although this may not be achieved consistently because of delays in the arrival of instructions at the scheduler due to latency issues elsewhere.…”

mentioning

confidence: 97%

“…A number of architectures perform dynamic code scheduling of the input code stream to identify concurrent code sequences, as "a schedule created at run-time is often better than one created at compile time" [9]. Thus DIF [10], DTSVLIW [11][12][13][14][15][16], and rePLay [9] architectures are all single threaded ones that do dynamic code scheduling on a single process.…”

mentioning

confidence: 99%

“…Thus DIF [10], DTSVLIW [11][12][13][14][15][16], and rePLay [9] architectures are all single threaded ones that do dynamic code scheduling on a single process. The DIF and DTSVLIW architectures both schedule a scalar instruction stream into VLIW long words using essentially the same concept, while rePLay uses a static scheduler to produce the source binary code, which is then optimized by a scheduler during code execution.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

Rounce

Souza

2008

Int J Parallel Prog

View full text Add to dashboard Cite

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1-4% that of a single processor.

show abstract

Improving quasi-dynamic schedules through region slip

Cited by 8 publications

References 28 publications

Low-power, low-complexity instruction issue using compiler assistance

Low-power, low-complexity instruction issue using compiler assistance

Reusing cached schedules in an out-of-order processor with in-order issue logic

Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

Contact Info

Product

Resources

About