Abstract-Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intralevel classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory.Index Terms-Block matching, H.264/AVC, motion estimation (ME), variable block size, very large scale integration (VLSI) architecture.
Abstract-In this paper, a low-power full-search block matching (FSBM) motion-estimation design for ITU-T recommendation H.263+ standard was proposed. New motion-estimation modes in H.263+ can be fully supported by our architecture. Unlike most previously presented motion-estimation chips, this design can deal with 8 8 and 16 16 block size with different searching ranges. Basically, the proposed architecture is composed of an integer pixel unit with 64 processing elements, and a half-pixel unit with interpolation, a control unit, and data registers. In order to minimize power consumption, gated-clock and dual-supply voltages are used. This design has been realized by TSMC 0.6 m SPTM CMOS technology. The power consumption is 423.8 mW at 60 MHz and the throughput is 36 fps in CIF format.
The emerging video coding standard, H.264/MPEG-4 AVC, allows using multiple reference frames for motion estimation to enhance temporal prediction. Exhaustive search requires a lot of computation, which is proportional to the number of searched frames. However, the reduction of prediction residues is highly dependent on the characteristics of video sequences. In many cases, searching more reference frames contributes to nothing but only waste of computation. In this paper, we proposed a novel algorithm to accelerate the motion estimation by using reference frame skipping criteria. We adopted selected macroblock mode, intra/inter macroblock prediction residues, discontinuity of motion vectors, and scene changes in the criteria. By using these criteria, each macroblock can determine whether it is necessary to keep on searching more reference frames after the block matching process in the first reference frame. Simulation results show that the proposed algorithm can save up to 67% of ME computation without noticeable degradation of video quality.
This paper presents a parallel architecture for the Embedded Block Coding (EBC) in JPEG 2000. The architecture is based on the proposed word-level EBC algorithm. By processing all the bit planes in parallel, the state variable memories for the context formation (CF) can be completely eliminated. The length of the FIFO (first-in first-out) between the CF and the arithmetic encoder (AE) is optimized by a reconfigurable FIFO architecture. To reduce the hardware cost of the parallel architecture, we proposed a folded AE architecture. The parallel EBC architecture can losslessly process 54 MSamples/s at 81 MHz, which can support HDTV 720p resolution at 30 frames/s. Index Terms-Discrete wavelet transform (DWT), embedded block coding (EBC), EBC with optimized truncation (EBCOT), image processing, JPEG 2000, parallel processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.