Speculative Multithreaded ProcessorsS emiconductor technologies-along with innovative computer architectures-have provided the bricks and mortar for building phenomenal improvements in processing speed during the past decade, culminating ultimately in the hundreds of millions of transistors used to build increasingly fast on-chip devices. Innovations in computer microarchitecture and accompanying compilers have enabled us to make good use of these building materials to provide highperformance computing systems.Typically, we decide how to use available semiconductor resources in two steps. First, we choose the desired functionality-the techniques for extracting and enhancing performance. In the implementation phase, we translate those techniques into structures and signals that we must then design, build, and verify. Although often described separately, in practice these two phases are tightly coupled.During the 1990s, novel functionality played the dominant role in processor design. Given a reasonable limit on overall design size-for example, fewer than tens of millions of transistors-we could divide up the transistor budget simply by using high-level performance metrics. Doing so made verification relatively simple, and designs did not have to explicitly account for wire delays, which were not significant compared to logic delays.In the future, implementation issues will likely dominate even basic functionality. We have begun to realize that scaling conventional superscalar designs increases complexity and cost with no guarantee that such designs will meet performance goals. Monolithic designs that use hundreds of millions of transistors will be very difficult to design, debug, and verify, and increasing wire delays will make intrachip communication and clock distribution costly. Consequently, some computer architects advocate a shift from highperformance to high-throughput processing, using distributed components that divide and conquer design process complexity and exploit communication locality to overcome wire delays. With this trend comes a renewed and increasing interest in multithreaded architectures. Such architectures can extract parallelism from a sequential program via thread-level speculation-be it control-driven or data-driven-giving them the flexibility to operate in both multiple-program, high-throughput and single-program, high-performance environments.
RATIONALE FOR SPECULATIVE MULTITHREADINGFortunately, the twin goals of increasing single-program performance and decreasing implementation difficulty don't necessarily conflict. The motivation for using speculative multithreading comes from two directions: On the one hand, we are already witnessing the diminishing potential of current techniques to extract parallelism from single programs and thus increase their performance; on the other, technology trends suggest the onset of commercial processors that can simultaneously execute multiple independent threads. 1 Thus, we are almost compelled to find innovations that will enable multithreaded proce...