In this paper, we propose an approach for full-search variable block size motion estimation using an FPGA. In the motion estimation, the current frame is divided to macro-blocks, and the best matching block is searched for each macroblock in the search area of the reference frame. In our approach, the scan direction of the macroblock in the current frame, and the scan direction of the matching in the search area are optimized in order to reduce the access to the off-chip memory banks which stores the reference frame, and the on-chip memory banks which cache the search area. By reducing both memory accesses, it becomes possible to realize high performance on a small size FPGA.
I. INTRODUCTIONIn a video sequence, a significant amount of temporal redundancy exists between frames. In the motion estimation, a frame (current frame) is divided to non-overlapped blocks (called macroblocks), and the best matching blocks are searched in other frames (reference frames). Then, the offsets (motion vectors) are used to remove the temporal redundancy. In variable block size motion estimation (VBSME), the macroblock can be segmented into smaller sub-blocks, each being assigned its own motion vector. VBSME improves the blockmatching efficiency, but requires a high computation effort. Many algorithms for finding the best matching blocks have been proposed. In the full-search block matching algorithm, all blocks in the search area (fixed size area in the reference frame) are compared with the macroblock (and its sub-blocks). The full-search algorithm gives better results than other algorithms, but its computational complexity is high. Many FPGA systems for the motion estimation have been proposed [1][2][3][4] [5]. In this paper, we propose an approach for fullsearch variable block size motion estimation using an FPGA. In our approach, 1) zigzag scan of the macroblock in the current frame, which reduces the memory access to off-chip memory banks, and 2) on-the-fly update of the cached data utilizing the dualport accessibility to the on-chip memory banks, which minimizes the required on-chip memory banks, are combined with already-known techniques such as 1) a SAD unit for variable block size, 2) scanning the image using (K × L) SAD units arranged as a box (the macroblock size can be enlarged to (16 × K) × (16 × L) pixels), which makes it possible to reuse the cached pixels on the FPGA among those SAD units,