As the next generation of video coding standard, High Efficiency Video Coding (HEVC) aims to reduce 50% bit rates in comparison with previous video coding standards. In order to increase the Deblocking Filter (DBF) throughput, we propose a memory of ping-pong and interlacing VLSI architecture to prevent DBF from unnecessarily waiting for pixels in both vertical and horizontal, which only takes 435 cycles at worst to process a LCU of 64x64 pixels size. Based on the memory organization, a four stage pipeline with a PreFilter was proposed to eliminate the data dependence in the filter processing and makes it working on 318M possible. As a result, our design can support 8kx4k@90fps real-time applications with SMIC 0.13um technology at the cost of 62.9k gates.