This paper presents an optimized hardware architecture of the inverse quantization and the inverse transform (IQ/IT) for a high-efficiency video coding (HEVC) decoder. Our highly parallel and pipelined architecture was designed to support all HEVC Transform Unit (TU) sizes: 4 × 4, 8 × 8, 16 × 16 , and 32 × 32. The IQ/IT was described in the VHSIC hardware description language and synthesized to Xilinx XC7Z020 field-programmable gate array (FPGA) and to TSMC 180 nm standard-cell library. The throughput of the hardware architecture reached in the worst case a processing rate of up to 1080 p at 33 fps at 146 MHz and 1080 p at 25 fps at 110 MHz when mapped to FPGA and standardcells, respectively. The validation of our architecture was conducted on the ZC702 platform using a Software/Hardware (SW/HW) enviro nment in order to evaluate different implementation methods (SW and SW/HW) in terms of power consumption and run-time. The experimental results demonstrate that the SW/HW accelerations were enhanced by more than 70% in terms of the run-time speed relative to the SW solution. Besides, the power consumption of the SW/HW designs was reduced by nearly 60% compared with the SW case.