Abstract:A new fully parallel architecture for the computation of a two-dimensional (2D) discrete cosine transform (DCT), based on the row-column decomposition is presented. It uses the same onedimensional (1D) DCT unit for the row and column computations and (N 2 +N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2D DCT.Each 1D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2 n arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2-compressors for the accumulation of the products in the intermediate stages and reduces the numbers of accumulators from N to 1. Using this procedure, the 2D DCT architecture requires less than N 2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N×N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay.3 Introduction: