IntroductionHigh performance floating-point (FP) dividers are essential arithmetic units for graphics applications and simulations, and various algorithms and implementation techniques have been proposed [3,4]. Using a 0.25pm CMOS technology, we have developed an FP divider, which supports IEEE-754 single-precision and double-precision formats. By using conventional static CMOS logic and (a) a radix-4 SRT algorithm [3] (from the initials of Sweeny, Robertson and Tocher, who developed this algorithm at the same time) with a maximally redundant digit set, (b) a partially nonredundant remainder scheme and (c) a simple operand prescaling; the divider can calculate 4 quotient digitdcycle at over 25OMHz with a 2 . W power supply. Figure 1 shows a block diagram of this divider. The pre-operation (PRE) block includes a pre-scaler, a divisor tripler and an exponent adder. By serially connecting two radix-4 SRT division blocks in the iteration (ITE) block, we obtain a quotient digit calculation performance of 4 digits/cycle. Figure 2 details the radix-4 SRT division block. The five most significant bits (MSBs) of the partial remainder are expressed in nonredundant form and the quotient selection logic (Qsel) determines the quotient digits by referring only their upper four bits (4r/5n scheme). And the least significant bits (LSBs) are expressed in redundant form. The post-operation (POST) block executes both a translation of the quotient from redundant to nonredundant form and a rounding (IEEE-754).
Implementation of the SRT algorithmThe SRT algorithm is used as a division algorithm in many LSIs, either a radix-2 or a radix-4 SRT algorithm is usually employed as a trade-off with respect to area and performance. In a radix-4 SRT algorithm implementation, there are two common choices for a digit set of the quotient; a minimally redundant digit set { -2,-1,0,1,2} and a maximally redundant digit set {-3,-2,-1,0,1,2,3}. Some MPUs employ a minimally redundant digit set to eliminate the divisor tripler; however, a quotient-digit-selection table is required to implement the complex selection logic [2] and the complexity increases the iteration cycle delay. We have employed a radix-4 SRT algorithm and a maximally redundant digit set as the quotient. While it is necessary to use a divisor tripler, we can design a simple quotient selection logic; i.e. a shorter critical path length and smaller number of logic gates, by using this digit set. Figure 3 shows the positive half of the P-D plot [3] used in this FP divider. Where r is the radix of the operation (in this case, r = 4), RC) is the partial remainder at the j-th iteration, and Q and D are the quotient and the divisor, respectively. In case of this radix-4 SRT algorithm; the six MSBs (including the sign) of the partial remainder in a nonredundant form, and the second and third most significant