As mobile applications are moving to realtime multimedia, more functions are integrated into handheld devices [1][2][3][4]. Unlike the realtime decoding of MPEG-4 video [1,4], the performances of hardware-accelerated 3D solutions designed for mobile platforms are still below the market demands showing only limited shading operations [2,3]. Since the realization of various 3D functions requires huge computing power and corresponding memory bandwidth, previous LSIs integrate DRAM using the embedded memory logic technology although it is cost-inefficient due to process complexity. In this work, a graphics LSI using the pure DRAM technology is implemented to integrate both the logic and memory at low cost. Its circuits and architecture are optimized so that the full 3D pipeline is realized with less than 210mW at the drawing speed of 264Mtexels/s bilinear MIPMAP texturing and antialiasing, applicable to handheld devices.As shown in Fig. 2.4.1, the graphics LSI consists of a 32b RISC, 3D rendering engine (3DRE), 29Mb DRAM, bandwidth equalizer (BEQ) and programmable power optimizer (PPO). The ARM-9 compatible RISC with 4kB I/D caches operates to 132MHz [6]. The RISC includes a 32 x 32b MAC in its datapath to accelerate the 3D geometry operations so that it can calculate as many as 1.04Mvertices/s transformation running a customized fixed-point graphics library, a 43% improvement over the conventional ARM9 processor [5]. SlimShader performs the main rendering operations such as texturing, shading, blending, and depth comparison. Memory Programmer (MP) enables the special effects such as antialiasing, motion blur and fog to be programmable. Integrated 29Mb DRAM with partial wordline scheme [1-3] provides sufficient bandwidth and capacity required for 3D rendering operations. Dedicated hardware engines and 1.6GB/s bandwidth through 416b-wide DRAM lowers the operation frequency of 3DRE even to 33MHz. To compensate the difference of the processing speed and data width between the RISC and 3DRE, BEQ buffers the vertex data with 1kB Dual-Ported SRAM (DP-SRAM). BEQ partially activates the banks of DP-SRAM according to the required buffer size saving 20% power of DP-SRAM. For DSP applications, DP-SRAM can also be used by the RISC as a scratchpad RAM. PPO reduces the power consumption of the chip by varying four different clock domains. 3DRE is shown in Fig. 2.4.2. The triangle setup engine (TSE), which contains single-cycle parallel dividers, distributes polygons to 2 pixel processors (PP). It enhances the overall performance of 3D pipeline by accelerating setup operations which took ~7,000 RISC cycles in the previous work [1][2][3]. A depth-first clock-gating (DFCG) scheme is applied to SlimShader for low power. DFCG can prevent the unnecessary shading and texturing by gating off the clock in the following datapath according to the results of the depth comparison. For realtime special effects, MP post-processes the rendered pixels of the previous frame transferring them to the display controller, while SlimShader renders the ...