As mobile applications are moving to realtime multimedia, more functions are integrated into handheld devices [1][2][3][4]. Unlike the realtime decoding of MPEG-4 video [1,4], the performances of hardware-accelerated 3D solutions designed for mobile platforms are still below the market demands showing only limited shading operations [2,3]. Since the realization of various 3D functions requires huge computing power and corresponding memory bandwidth, previous LSIs integrate DRAM using the embedded memory logic technology although it is cost-inefficient due to process complexity. In this work, a graphics LSI using the pure DRAM technology is implemented to integrate both the logic and memory at low cost. Its circuits and architecture are optimized so that the full 3D pipeline is realized with less than 210mW at the drawing speed of 264Mtexels/s bilinear MIPMAP texturing and antialiasing, applicable to handheld devices.As shown in Fig. 2.4.1, the graphics LSI consists of a 32b RISC, 3D rendering engine (3DRE), 29Mb DRAM, bandwidth equalizer (BEQ) and programmable power optimizer (PPO). The ARM-9 compatible RISC with 4kB I/D caches operates to 132MHz [6]. The RISC includes a 32 x 32b MAC in its datapath to accelerate the 3D geometry operations so that it can calculate as many as 1.04Mvertices/s transformation running a customized fixed-point graphics library, a 43% improvement over the conventional ARM9 processor [5]. SlimShader performs the main rendering operations such as texturing, shading, blending, and depth comparison. Memory Programmer (MP) enables the special effects such as antialiasing, motion blur and fog to be programmable. Integrated 29Mb DRAM with partial wordline scheme [1-3] provides sufficient bandwidth and capacity required for 3D rendering operations. Dedicated hardware engines and 1.6GB/s bandwidth through 416b-wide DRAM lowers the operation frequency of 3DRE even to 33MHz. To compensate the difference of the processing speed and data width between the RISC and 3DRE, BEQ buffers the vertex data with 1kB Dual-Ported SRAM (DP-SRAM). BEQ partially activates the banks of DP-SRAM according to the required buffer size saving 20% power of DP-SRAM. For DSP applications, DP-SRAM can also be used by the RISC as a scratchpad RAM. PPO reduces the power consumption of the chip by varying four different clock domains. 3DRE is shown in Fig. 2.4.2. The triangle setup engine (TSE), which contains single-cycle parallel dividers, distributes polygons to 2 pixel processors (PP). It enhances the overall performance of 3D pipeline by accelerating setup operations which took ~7,000 RISC cycles in the previous work [1][2][3]. A depth-first clock-gating (DFCG) scheme is applied to SlimShader for low power. DFCG can prevent the unnecessary shading and texturing by gating off the clock in the following datapath according to the results of the depth comparison. For realtime special effects, MP post-processes the rendered pixels of the previous frame transferring them to the display controller, while SlimShader renders the ...
The real time 3D graphics becomes one of the attractive applications for 3G wireless terminals although their battery lifetime and memory bandwidth limit the system resources for graphics processing. Instead of using the dedicated hardware engine with complex functions, we propose an efficient hardware architecture of low power vertex shader with programmability. Our architecture includes the following three features: I) a fixed-point SIMD datapath to exploit parallelism in vertex processing while keeping the power consumption low, II) a multithreaded coprocessor interface to decrease unwanted stalls between the main processor and the vertex shader, reducing power consumption by instruction-level power management, III) a programmable vertex engine to increases the datapath throughput by concurrent operations with main processor. Simulation results show that full 3D geometry pipeline can be performed at 7.2M vertices/sec with 115mW power consumption for polygons using the OpenGL lighting model. The improvement is about 10 times greater than that of the latest graphics core with floating-point datapath for wireless applications in terms of processing speed normalized by power consumption, Kvertices/sec per milliwatt.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.