design considerations for HD video encoder architecture design, focusing on algorithm and architecture design for crucial modules, including integer and fractional pixel motion estimation, mode decision, and the modules suffering from data dependency, such as intra prediction and motion vector prediction.
High Definition Video Encoder Hardware Implementation
VLSI ImplementationAVS and H.264/AVC video encoders may be implemented on platforms such as general CPU or DSP processor, multi-core processor, or hardware platforms such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). For efficient HD video encoder, FPGA and ASIC are well-suited platforms for VLSI implementation. These platforms offer huge hardware computation (macrocells or hardware gate) and onchip storage (SRAM) resources, which are both important and indispensible for professional HD MPEG-like video encoder implementation.The hardware architectures for MPEG-4 video encoders were reviewed in [11]. Also, there are several only intra-frame encoder architectures reported in [12]- [14]. The predominating VLSI architectures for HD H.264/AVC encoder architectures were reported in the literature. However, algorithm and architecture further optimization is still important and urgent.
Design ChallengesThere are several challenges as for HD video encoder architecture design, including ultra high complexity and throughput, high external memory bandwidth and on-chip SRAM consumption, hardware data dependency, and complex prediction structures. Moreover, multiple target performance trade-off should be taken into consideration.The first challenge is complexity and throughput. H.264 and AVS requires much higher computation complexity than the previous standards, especially for HDTV applications. There are some coding tools that contribute to performance improvement, however resulting in high computation complexity, such as complex temporal prediction with multiple reference frame (MRF), fractional motion vector (MV) accuracy, and variable block size motion estimation (VBSME), intra prediction with multiple prediction modes, Lagrangian mode decision (MD), and context-based adaptive binary arithmetic coding (CABAC). As a result, the processing throughput is dramatically high. Taking 1080P@30Hz as an example, there are 8160 macroblocks (MB) in one frame, and the MB pipelining throughput is 244800 MBs per second. In QFHD@ 30fps format, the throughout is as four time as that in 1080P@30fps. In the-state-of-the-art architectures [15]-[21], the average MB pipeline interval generally varies from 100 to 500 cycles. Under this constraint, the architecture designs, for IME with large search range and FME with multiple modes, are both huge challenges.The second challenge is the processing sequential flow and data dependency. There are frame, MB, and block level data dependencies. The frame-level dependencies due to I, P, Algorithm and VLSI Architecture Design for MPEG-Like High Definition Video Coding-AVS Video Coding from Standard Specif...