High Definition (HD) video compression enables vivid reproduction of scenes. However, Motion Estimation (ME) requires large memory capacity and huge memory bandwidth, which are undesirable in many platforms including ASIC and SoC. In this paper, an algorithm and architecture design of cache system and fast ME in HD H.264/AVC are proposed. With the proposed cache system and hardware-oriented fast ME algorithm, the rate-distortion performance is maintained within 0.03dB difference, the size of on-chip memory reduced to only 10% to 21% of original size, while the external memory bandwidth from cache refill is also 18% to 56% less than that of level C data reuse scheme with vertical 64 search range.