This paper presents a cost-effective 2D-DCT processor based on a fast row/column decomposition approach. With a particular schedule, the processor does not require the transposed memory for 2D-DCT computing. We re-arrange the cosine coefficients of the first and second 1D-DCT transformations to keep DC-coefficient error free. The new architecture uses state-machines to generate cosine coefficients rather than ROM table, to save the memory cells and the address generator. For 8×8 DCT realization, the circuit only needs 36 adders without multipliers, and the whole chip uses about 19 k transistors. The chip area is about 4 mm 2 using TSMC 0.35 um CMOS process. The circuit complexity is only 1/3~1/5 of the conventional DCT chips.