The computational module of several MPEG-based video encoders, which includes the known algorithms of Discrete Cosine Transform, Hadamard Transform and Quantization, is widely used to identify and compress spatial redundancy in intra (raw input) or inter (computed residue) data pixel matrices. For some modern multimedia applications, like high definition (HD H.264/AVC) or scalable (H.264/SVC) encoder solutions, the demand for fast module implementations becomes critical. Practical experiments indicate that, inside a H.264 computational module, the quantization module normally represents a real bottleneck for fast hardware implementations.
Considering that we propose a complete integrated solution of H.264 computational module, which incorporates the direct and inverse algorithms of Discrete Cosine Transform, Hadamardand Quantization with minimal communication delays. Also in this paper it is presented a practical study, considering distinct levels of parallelism for the quantization to demonstrate its influence in order to optimize global encoder complexity and performance. All proposed alternatives were designed using hardware description language VHDL and implemented into commercial FPGA boards to obtain experimental results.