Online transaction processing (OLTP) is a multibillion dollar industry with high-end database servers employing state-of-the-art processors to maximize performance. Unfortunately, recent studies show that CPUs are far from realizing their maximum intended throughput because of delays in the processor caches. When running OLTP, instruction-related delays in the memory subsystem account for 25 to 40% of the total execution time. In contrast to data, instruction misses cannot be overlapped with out-of-order execution, and instruction caches cannot grow as the slower access time directly affects the processor speed. The challenge is to alleviate the instructionrelated delays without increasing the cache size.We propose Steps, a technique that minimizes instruction cache misses in OLTP workloads by multiplexing concurrent transactions and exploiting common code paths. One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution. Steps yields up to 96.7% reduction in instruction cache misses for each additional concurrent transaction, and at the same time eliminates up to 64% of mispredicted branches by loading a repeating execution pattern into the CPU. This paper (a) describes the design and implementation of Steps, (b) analyzes Steps using microbenchmarks, and (c) shows Steps performance when running TPC-C on top of the Shore storage manager.
PrologueIn the past decade, research has proposed techniques to identify and reduce CPU performance bottlenecks in database workloads. As memory access times improve much slower than processor speed, performance is bound by instruction and data cache misses that cause expensive main-memory accesses. Research [AD+99][LB+98] [SBG02] shows that decision-support (DSS) applications are predominantly delayed by data cache misses, whereas OLTP is bounded by instruction cache misses. Although several techniques can reduce data cache misses (larger caches, out-of-order execution, better data placement), none of these can effectively address instruction caches.