A quad-issue microprocessor chip implements a 64b architecture extension to a popular 32b risc instruction set. Additional instructions and dedicated hardware provide up to 1Ox speed-up of image processing and rendering algorithms, including video compressioddecompression and texture-mapped 3d triangles. The chip contains 5.2M drawn transistors on a 17.7x17.8 mm2die in 0 . 5~ CMOS with 4 metal layers (Figure 1, Table 1). The package is a 520-pin plastic BGA with 187 power and ground pins. Operating at 167MHz, it dissipates less than 30W from a 3.3V supply.An instruction prefetch unit contains a two-way 16kB instruction cache, 64-entry fully-associative instruction TLB, and a dualported next-field RAM. The next-field RAM contains two branch predictors, a next cache line index, and set prediction bit for every four instructions in the instruction cache. hefetched instructions are placed in a twelve-entry instruction buffer. 32 predecode bits are added to each instruction to simplify instruction grouping. Grouping logic performs in-order dispatch of up to four instructions per clock, using a custom dynamic block containing 250 comparators, and a standard-cell blockcontaining 68ktransistors.A delayed reset logic is used in the RAM structures with some advantages over domino or postcharge logic ( Figure 2). Selective resethecovery results in a self-timed gate relative to the input.Clocks are distributed only to the input stage (not shown) generating the timing. Power is minimized since only used stages are resethecovered (unlike Domino logic). Timing of the forward and reset states is controlled from the input stage eliminating embedded timing of the postcharge concept 111. The recovery state is locally self-timed in each local gate. The stages can be temporally pipelined similar to postcharge logic and with cycle time advantage relative to domino logic due to the absence of a long precharge period. Note, this is not dual-rail trudcomplement self-generating logic. This is acceptable in the restricted application. This logic is self-resetting [21. Gate delays are similar to other dynamic logic styles and approximately 50% of CMOS gate delays. However, unlike pure dynamic styles, all nodes are held by low impedance keepers and are static when clock (not shown) is held quiescent.Alternatingn stages and p stages are used. Areset input and reset output are added to the normal inputs and output. The gate normally receives the reset input from the immediately preceding stage in the case of local blocks (Figure 3). A second method for generating the reset is to locally invert the out signal from the previous stage. This has a cycle-time penalty. The p stage functions similarly to the n stage with all signals and transistors inverted. The n stage quiescent condition begins with nodes out and rst-out in the high state and both node in-a and node in-b in the low state with keeper transistors holding these nodes ( Figure 2, identified with a k) and node r in the high state and node fin the low state. The forward propagation stat...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.