In Advanced Audio Video coding Standard (AVS), the utilization of variable block size ranging from 16x16 to 8x8 in inter frame encoding improves the coding efficiency significantly compared with a fixed MB partition. Rate distortion optimization (RDO) is the best known mode decision method, but the corresponding extremely high computational complexity limits its application. This paper proposes an algorithm based on the visual perception model and Sobel operator edge detection model to quickly select the best inter mode from 16x16, 16x8, 8x16 and 8x8 just by using the original pixels. We further analyze and redesign the MB level pipeline structure, and give the optimized hardware structure of the encoder. We tested different sequences including cif, 720p and 1080p, and the experimental results show that the coding efficiency is comparable with the traditional RDO method. The proposed hardware structure saves fractional motion estimation (FME) by 60% in areas and reduces the processing time by 200 cycles. Our proposed mode decision architecture can support the real time processing of 1080P@30fps.
Keywords-inter mode decision, AVS, Sobel operator, visual perception determining model, hardware structure I. 0 B INTRODUCTION Advanced Audio Video coding Standard (AVS) is established by China AVS Working Group, it has been accepted as an option by ITU-TFGIPTY for IPTV applications.In AVS, there are four types of partitions in a macroblock (MB): 16x16, 16x8, 8x16 and 8x8 as shown in Fig. 1[1]. To achieve the highest coding efficiency, some previous works like [2-3] used rate distortion optimization (RDO) technique to select the best mode from all the candidate modes of AVS standard. In RDO based mode decision (MD), all the RD costs of different MB modes are calculated via RDO technique, and the mode with minimum RD cost value is chosen as the best mode. The RD cost is computed using (1) for each candidate mode:D is described as sum of squared differences (SSD) for AVS RDO based mode decision. It represents the distortion between the original picture and the reconstructed picture. e mod λ is a weight parameter. R is the coding bits for each mode. The reconstructed pixels are needed to yield SSD. Generally, it needs the motion estimation, discrete cosine transform (DCT), quantization, inverse quantization, inverse DCT and entropy coding to get the reconstructed pixels and the real coding bits, and the whole process is of great computational complexity. To address the high computational complexity problem, a wide range of fast algorithms for inter mode selection have been developed.16x16 type 16x8 type 8x16 type 8 x 8 t ype 0 0 1 0 1 0 1 2 3 Figure 1. Different partition in a MB[2] provides a fast mode decision method which select the best mode based on the spatial homogeneity and the temporal stationarity characteristics of video objects. Based on the algorithm, only a small number of inter modes are selected in RDO process. In [3] [4], the authors judge the different modes by using a threshold technique. However, the thre...