This paper presents a hardware design capable of supporting high-efficiency video coding inverse discrete cosine transform (IDCT) with a 32 × 32 transform unit size, using a single 1-D IDCT core with transpose memory to reduce costs. The proposed 1-D IDCT core employs 16 computation paths for high throughput and is implemented using distributed arithmetic to facilitate the sharing of hardware resources. The proposed 1-D IDCT is capable of calculating 1-D and 2-D data simultaneously along 32 parallel paths. When implemented using Taiwan Semiconductor Manufacturing Company (TSMC) 40-nm CMOS technology, the proposed 2-D transform core provides throughput of 6.4 gigapixels/s with a gate count of 335 k. The results show that a superior hardware efficiency can be achieved in the proposed 32-point IDCT core compared with the existing works.