Modern processors use out-of-order processing logic to achieve high performance in Instructions Per Cycle (IPC) but this logic has a serious impact on
the achievable frequency. In order to get better performance out of smaller transistors there is a trend to increase the number of cores per die instead of
making the cores themselves bigger. Moreover, for throughput-oriented and server workloads, simpler in-order processors that allow more cores per die
and higher design frequencies are becoming the preferred choice. Unfortunately, for other workloads this type of cores result in a lower single thread
performance.
There are many workloads where it is still important to achieve good single thread performance. In this thesis we present the ReLaSch processor.
Its aim is to enable high IPC cores capable of running at high clock frequencies by processing the instructions using simple superscalar in-order issue
logic and caching instruction groups that are dynamically scheduled in hardware after commit, that is, out of the critical path and only when really
needed.
Objective
This thesis has several research goals:
• Show that the dynamic scheduler of a conventional out-of-order processor does a lot of redundant work because it ignores the
repetitiveness of code.
• Propose a complete superscalar out-of-order architecture that reduces the amount of redundant work done by creating the
schedules once in dedicated hardware, storing them in a cache of schedules and reusing the schedules as much as possible.
• Place the scheduler out of the critical path of execution, which should be enabled by the reduction of work that it must do. Thus,
the execution path of our proposed processor can be simpler than that of a conventional out-of-order processor.
Proposal and results
We present the \textbf{ReLaSch} processor, named after Reused Late Schedules, in which the creation of issue-groups is removed from the critical
path of execution and uses a simple and small in-order issue logic. It just wakes-up and selects the instructions of a single issue-group each cycle,
instead of processing the instructions of a whole issue queue.
A new logic at the end of the conventional pipeline schedules the committed instructions. The new scheduler can be complex since it is not in the critical
path of execution. The schedules are cached and whenever it is possible an rgroup is read and its instructions executed. The schedules are reused,
lowering the pressure on the scheduling logic.
In some cases, the ReLaSch processor is able to outperform a conventional out-of-order processor, because the post-commit scheduler has a broader
vision of the code. For instance, while ReLaSch can schedule together two independent instructions that are distant in the code, a conventional out-oforder
processor only issues them in the same cycle if both are in-flight.
The ReLaSch processor predicts the branch targets, memory aliases and latencies at scheduling time, out of the critical path. The prediction is based
on the most recent executions at scheduling time. Furthermore, most of the register renaming process is performed by the scheduler and is removed
from the execution pipeline.
Our experiments show that ReLaSch has the same average IPC as our reference out-of-order processor and is clearly better than the reference inorder
processor (1.55 speed-up). In all cases it outperforms the in-order processor and in 23 benchmarks out of 40 it has a higher IPC than the
reference out-of-order processor.