Wen-Chang Yeh scite author profile

Jen

2000

IEEE Trans. Comput.

197

ÐThis paper presents a design methodology for high-speed Booth encoded parallel multiplier. For partial product generation, we propose a new modified Booth encoding (MBE) scheme to improve the performance of traditional MBE schemes. For final addition, a new algorithm is developed to construct multiple-level conditional-sum adder (MLCSMA). The proposed algorithm can optimize final adder according to the given cell properties and input delay profile. Compared with a binary tree-based conditional-sum adder, the speed performance improvement is up to 25 percent. On average, the design developed herein reduces the total delay by 8 percent for parallel multiplier. The whole design has been verified by gate level simulation.

IEEE Trans. Signal Process.

High-speed and low-power split-radix FFT

Yeh¹,

Jen

2003

130

This paper presents a novel split-radix fast Fourier transform (SRFFT) pipeline architecture design. A mapping methodology has been developed to obtain regular and modular pipeline for split-radix algorithm. The pipeline is repartitioned to balance the latency between complex multiplication and butterfly operation by using carry-save addition. The number of complex multiplier is minimized via a bit-inverse and bit-reverse data scheduling scheme. One can also apply the design methodology described here to obtain regular and modular pipeline for the other Cooley-Tukey-based algorithms. For an (= 2)-point FFT, the requirements are log 4 1 multipliers, 4 log 4 complex adders, and memory of size 1 complex words for data reordering. The initial latency is + 2 log 2 clock cycles. On the average, it completes an-point FFT in clock cycles. From post-layout simulations, the maximum clock rate is 150 MHz (75 MHz) at 3.3 v (2.7 v), 25 C (100 C) using a 0.35m cell library from Avant!. A 64-point SRFFT pipeline design has been implemented and consumes 507 mW at 100 MHz, 3.3 v, and 25 C. Compared with a radix-2 2 FFT implementation, the power consumption is reduced by an amount of 15%, whereas the speed is improved by 14.5%.

Index rendering: hardware-efficient architecture for 3-D graphics in multimedia system

Liang

Lee

et al. 2002

IEEE Trans. Multimedia

Real-time three-dimensional (3-D) graphics emerges rapidly in multimedia applications, but it suffers from requirements for huge computation, high bandwidth, and large buffer. In order to achieve hardware efficiency for 3-D graphics rendering, we propose a novel approach named index rendering. The basic concept of index rendering is to realize 3-D rendering pipeline by using asynchronous multi-dataflows. Because triangle information can be divided into several parts with each part capable of being transferred independently and asynchronously. At last, all data are converged by the index to generate the final image. Index rendering approach can eliminate unnecessary operations in traditional 3-D graphics pipeline. The unnecessary operations are caused by the invisible pixels and triangles in the 3-D scene. Previous work, deferred shading, eliminates the operations relating to invisible pixels, but it requires huge tradeoffs in bandwidth and buffer size. With index rendering, we can eliminate operations on both invisible pixels and triangles with less tradeoffs as compared with deferred shading approach. The simulation and analysis results show that the index rendering approach can reduce 10%-70% of lighting operations when using flat and Gouraud shading process and decrease 30%-95% when using Phong shading. Furthermore, it saves 70% of buffer size and 50%-70% of bandwidth compared with deferred shading approach. The result also indicates that this approach of index rendering is especially suitable for low-cost portable rendering device. Hence, index rendering is a hardware-efficient architecture for 3-D graphics, and it makes rendering hardware easier to be integrated into multimedia system, especially in system-on-a-chip (SOC) design.

On the study of logarithmic time parallel adders

Jen²

This work formulates a set of equations to describe logarithmic time parallel adders. The equations can be used to explain several popular fast adder schemes and derive new adder schemes easily. It is shown that if there is an adder constructed from conditional-sum rule, then we can always obtain another adder based on carry-lookahead rule with equivalent topology and structure, and vice versa.Index Termsparallel adder, adder algorithm, conditional-sum, and carry-lookahead. I. INTRODUCTIONAddition is an essential operation in many digital signal-processing applications. Fast addition is usually achieved by using logarithmic time parallel adders. Conventionally, logarithmic time parallel adders can be categorized into two classes of algorithms, carry-lookahead and conditional-sum. However, it is hard to compare their performance and determine the circuit complexity for each adder scheme. Another disadvantage of such classification is that different optimization technique has to be developed for each adder type. Four popular logarithmic time parallel time schemes, including canylookahead adder (CLA)[ 11, ELM adder (ELMA)[2][3], conditional-sum adder (CSMA)[4], and conditional-carry adder (CCA)[S], are discussed herein. Three ternary operators are defined to simplify the analysis, and a set of equations using the defined operators is developed to describe these adders. These equations show that the performance of each adder scheme can be determined and be compared by 0-7803-6488-O/OO/$lO.OO 0 2000 IEEE

A high performance carry-save to signed-digit recoder for fused addition-multiplication

Jen²