This paper presents a memory-efficient approach to realize the cyclic convolution and its application to the discrete cosine transform (DCT). We adopt the way of distributed arithmetic (DA) computation, exploit the symmetry property of DCT coefficients to merge the elements in the matrix of DCT kernel, separate the kernel to be two perfect cyclic forms, and partition the content of ROM into groups to facilitate an efficient realization of a one-dimensional (1-D) -point DCT kernel using ( -1)/2 adders or substractors, one small ROM module, a barrel shifter, and (( 1) 2) + 1 accumulators. The proposed memory-efficient design technique is characterized by rearranging the content of the ROM using the conventional DA approach into several groups such that all the elements in a group can be accessed simultaneously in accumulating all the DCT outputs for increasing the ROM utilization. Considering an example using 16-bit coefficients, the proposed design can save more than 57% of the delayarea product, as compare with the existing DA-based designs in the case of the 1-D seven-point DCT. Finally, a 1-D DCT chip was implemented to illustrate the efficiency associated with the proposed approach.