This 4 th generation Alpha microprocessor running at 1.2GHz delivers up to 44.8GB/s chip pin bandwidth. It contains a 1.75MB, ECC protected, 7-way set associative 2 nd level write back-cache that delivers 19.2GB/s bandwidth; two memory controllers supporting 8 Rambus™ channels running at 800Mb/s; four 6.4GB/s inter-processor communication ports; and a separate IO port capable of 6.4GB/s operation. The 21.1x18.8mm 2 chip contains 152M transistors and dissipates 125W at 1.5V. It is packaged in a 1443-pin LGA package and air-cooled. It is fabricated in a 0.18µm, bulk CMOS process with 7 levels of copper interconnect. The chip is partitioned into four clocking domains and uses digital delay line loop to provide low skew, controlled edge rate, synchronous clocks across the chip. Figure 15.6.1 shows the major functional units. The CPU of the chip is leveraged from an earlier Alpha design [1,2].The Level-2 Cache Controller Unit manages the L2 cache and coordinates with the memory controllers and the router to service memory fills and cache coherence requests. It queues up to 16 Level-1 cache miss addresses, 16 L1/L2 victim addresses, 35 cache coherence requests and response addresses, and 4 write IO addresses. The address queues are implemented in a single 73entry Address Register File with multiple read, write and CAM ports (Figure 15.6.2). The data can be read or written every cycle from the top or bottom of the register file to minimize routing distance. The entry status bits are stored in a separate array. The address queues and status arrays generate up to 25 requests that the arbiter dynamically schedules onto 6 unique resources every cycle.Misses from local or remote processors arrive at the memory controller and are stored in two 32-entry Directory In Flight Tables (DIFT). Incoming transactions are compared against pending DIFT entries. Depending on the result of the CAM lookup, the transaction is either serialized to avoid hazards, or merged to deliver a response to a pending transaction, or a new entry is created. The DIFT tracks the coherence state for 32 possible outstanding transactions. The DIFT issue logic picks 1 of these 32 arbitrating entries. A scoreboard tracks the availability of 3 physical resources, which are used by 5 different classes of transactions. The issue logic is pipelined over 2 cycles and accounts for entries selected from the previous cycle. The DIFT can issue to the 3 resources each cycle, one of which is the DRAM controller ( Figure 15.6.3). The DIFT arbiter can pick an entry each cycle while avoiding deadlock and maintaining fairness.The DRAM controller (Figure 15.6.4) has a conflict-detection pipeline that maximizes parallelism and minimizes DRAM bank conflicts. It translates the request address into device, bank, row, and column fields optimized for the RDRAM™ configuration in the 1st cycle. It indexes a table that tracks open pages and active banks in the RDRAMs and determines the appropriate action in the memory system in the 2 nd cycle. It then compares the address against a 2...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.