Abstract-Two-step physical register deallocation (TSD) is an architectural scheme, which enhances memory-level parallelism (MLP) by pre-executing instructions. Ideally, the TSD allows MLP under the unlimited number of physical registers to be exploited, and consequently only a small register file is necessary for MLP. In practice, however, the amount of MLP exploitable is limited, because there are cases where pre-execution is not performed or timing of pre-execution is delayed. This is caused by data dependencies among the pre-executed instructions. This paper proposes the use of value prediction to solve these problems. Our way of the value prediction usage has the advantage over the conventional way of the usage for enhancing ILP, that there is no need to recover from misspeculation. Our evaluation results using SPECfp2000 benchmark show that our scheme can achieve equivalent performance to that of the previous TSD scheme without value prediction, with 75% of the register file size.I. INTRODUCTION Supporting many in-flight instructions allows aggressive exploitation of instruction-level parallelism (ILP) and memory-level parallelism (MLP), leading to performance increases. The exploitation of MLP is especially effective in memory-intensive programs. To support many in-flight instructions, a large register file is required. However, a large register file affects the clock cycle time adversely because it takes a long time to access. Although this adverse effect can be alleviated by pipelining, this complicates the bypass logic instead. In addition, having a deep pipeline increases the branch misprediction penalty, lowering IPC. Therefore, it is difficult to remove the adverse effect of a large register file completely. It is important to reduce the register file size without performance degradation. Two-step physical register deallocation (TSD) is a novel register renaming scheme [1], [2], which allows the preexecution of instructions that cannot be executed due to lack of a physical register in the conventional renaming scheme, exploiting MLP aggressively. The TSD can exploit a large amount of MLP under the infinite number of physical registers, independently of the real physical register count. Thus, a large register file is not required for exploiting MLP.The TSD deallocates physical registers in two phases: 1) the temporal deallocation, which allows the physical register to be allocated to another instruction; and 2) the final deallocation, which allows the result write to be granted. The TSD completely removes the pipeline stall in the rename stage, which is due to a shortage of physical registers, by