IntrouctionRapid growth in High-Definition (HD) digital video applications has lead to an increased interest in portable HDquality encoder design. HD-compatible MPEG2 MP@HL encoder uses Full Search Block Matching Algorithm (FS-BMA) based Motion Estimation (ME). The ME module accounts for more than 80% of the computational complexity of a typical video encoder. Moreover, the power consumption of an FSBM-based encoder is prohibitively high, particularly for portable implementations. Hence, efficient ME processor cores need to be designed to realize portable HDTV video encoders.Parameterizable FSBM ASIC design to solve the input bandwidth problem by using on-chip line buffers was proposed in [15]. [18] proposed a family of modular VLSI architectures which allow sequential inputs but perform parallel processing with 100 percent efficiency. A systolic mapping procedure to derive FSBM architectures was proposed in [4]. The designs of ([2], [20]) and [5] focused on the reduction of pin counts by sharing memory units and 2-dimensional data reuse, respectively. [19] improved the memory bandwidth by using an overlapped data flow of search area which increased the processing element (PE) utilization. A low-latency high-throughput tree architecture for FSBM was proposed in [3]. Both [13] and [1] proposed low-power architectures based on removal of unnecessary computations. Finally, a novel low-power parallel tree FSBM architecture was proposed in [6], which exploited the spatial data correlations within parallel candidate block searches for data sharing and thus effectively reduces data access bandwidth and power consumption. [7] proposed an FPGA architecture to implement parallel computation of FSBM. Systolic array and novel OnLine Arithmetic (OLA) based designs for FSBM were proposed in [8] and [9], respectively. Customizable low-power FPGA cores were proposed by [10]. The aforementioned FSBM architectures can be divided into two categories, namely, FPGA [7,8,9,10,11,17] and ASIC [4,15,18,2,3,20,5,19,13,1,6]. This work uses FPGA technology to implement a high-performance ME hardware with due consideration to (a) processing speed and (b) silicon area. Almost all aforementioned VLSI architectures optimize any one of these parameters. The novelty of the proposed architecture lies in its combined optimization of the aforementioned conflicting design requirements. The proposed hardware uses an initially-split pipeline to reduce processing cycles for each MB and thus increases the throughput. In addition, this design requires less number of adders and only one Absolute Difference (AD) PE, which drastically reduces the silicon area when compared to other existing designs. The pixels of the search regions have been organized in memory banks such that two sets of 128-bit (16 8-bit pixels) data can be accessed in each clock cycle.Section 2 gives an overview of FSBM-based motion estimation. Section 3 presents a brief discussion on SAD modifications and describes the proposed FSBM hardware. The implementation and comparative...