Absfrucf-The two-dimensional discrete cosine transform (2-D DCT) has been widely recognized as a key processing unit for image data compressioddecompression. In this paper, the implementation of a 200 MHz 13.3 mm' 8 x 8 2-D DCT macrocell capable of HDTV rates, based on a direct realization of the DCT, and using distributed arithmetic is presented. The macrocell, fabricated using 0.8 /im base-rule CMOS technology and 0.5 pm MOSFET's, performs the DCT processing with I sample-(pixel)-per-clock throughput. The fast speed and small area are achieved by a novel sense-amplifying pipeline flip-flop (SA-FN) circuit technique in combination with t t MOS differential logic. The SA-FN, a class of delay flip-flops, can be used as a differential synchronous sense-amplifier, and can amplify dualrail inputs with swings lower than 100 mV. A 1.6 ns 20 bit carry skip adder used in the DCT macrocell, which was designed by the same scheme, is also described. The adder is 50% faster and 30% smaller than a conventional CMOS carry look ahead adder, which reduces the macrocell size by 15% compared to a conventional CMOS implementation.