Variable block size motion estimation has contributed greatly to achieving an optimal interframe encoding, but involves high computational complexity and huge memory access, which is the most critical bottleneck in ultra-high-definition video encoding. This article presents a hardware-efficient block matching algorithm with an efficient hardware design that is able to reduce the computational complexity of motion estimation while providing a sustained and steady coding performance for high-quality video encoding. A three-level memory organization is proposed to reduce memory bandwidth requirement while supporting a predictive common search window. By applying multiple search strategies and early termination, the proposed design provides 1.8 to 3.7 times higher hardware efficiency than other works. Furthermore, on-chip memory has been reduced by 96.5% and off-chip bandwidth requirement has been reduced by 39.4% thanks to the proposed three-level memory organization. The corresponding power consumption is only 198mW at the highest working frequency of 500MHz. The proposed design is attractive for high-quality video encoding in real-time applications with low power consumption.