LIU Wenjiang ( ), WANG Ruolin ( ) LIU Tao ( ), RONG Mengtian ( ) Abstract: Block-matching and 3D-filtering (BM3D) is a state of the art denoising algorithm for image/video, which takes full advantages of the spatial correlation and the temporal correlation of the video. The algorithm performance comes at the price of more similar blocks finding and filtering which bring high computation and memory access. Area, memory bandwidth and computation are the major bottlenecks to design a feasible architecture because of large frame size and search range. In this paper, we introduce a novel structure to increase data reuse rate and reduce the internal static-random-access-memory (SRAM) memory. Our target is to design a phase alternating line (PAL) or real-time processing chip of BM3D. We propose an application specific integrated circuit (ASIC) architecture of BM3D for a 720 × 576 BT656 PAL format. The feature of the chip is with 100 MHz system frequency and a 166-MHz 32-bit double data rate (DDR). When noise is σ = 25, we successfully realize real-time denoising and achieve about 10 dB peak signal to noise ratio (PSNR) advance just by one iteration of the BM3D algorithm.A novel image/video denoising strategy, blockmatching and 3D-filtering (BM3D) [1][2] , is based on an enhanced sparse representation in transform domain. The enhancement of the sparsity is achieved by grouping similar 2D image fragments (e.g., blocks) into 3D data arrays which are called "groups". In our architecture, we call this step as block-matching. A 3D-filtering is a special procedure developed to deal with these 3D data arrays. It has three successive steps: 3D transformation of a group, shrinkage of the transform spectrum, and inverse 3D transformation. The result is a 3D estimate that consists of the joint-filtered grouped image blocks. As a way of attenuating the noise, the filtering reveals even the finest details shared by grouped blocks, and at the same time, it preserves the essential unique features of each individual block.The filtered blocks are then returned to their original positions. Because these blocks are overlapped, for each pixel we obtain many different estimates which need to be combined. Aggregation does the reconstruction job which gets the weighted average by aggregating the weighted estimates of each pixel.In block-matching, BM3D finds the best similar blocks of current block in current frame and reference frames, and then groups them. According to Refs. [3][4][5], block-matching is a particular matching approach that has been extensively used for motion estimation in video compression (MPEG 1, 2 and 4, and H.26x). Yap and McCanny [6] proposed a shuffling mechanism to accelerate processing rate. But in these papers, the macro blocks are processed in order, i.e. the motion vector (MV) of neighbor blocks must be considered. In this paper, we put forward a tricky full searching method to find the best similar blocks without MV of neighbor blocks which bring more parallelism and hardware efficiency.As a particular way of gro...