This paper presents a memory efficient design for realizing the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic computation, and exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel and then separate the kernel to be two perfect cyclic forms to facilitate an efficient realization of I-D N-point DCT using (N-1)/2 adders or substractors, one small ROM module, a barrel shifter, and k! + I accumulators. The comparison results with the existing designs show that the. proposed design can reduce delay-area product significantly. 2