Real-time video compression is a challenging subject for FPGA implementation because it typically has a large computational complexity and requires high data throughput. Previous implementations have used parallel banks of FPGAs or DSPs [1,2,3] to meet these requirements. Using design techniques that maximize FPGA utilization, we have implemented two video compression systems, each of which uses a single FPGA. In the first system, algorithmic optimizations are made to create a low-complexity implementation that exploits the in-system programmability of the FPGA. This low-complexity implementation performs well, but is limited to a single compression algorithm. In the second system, the FPGA is augmented with an external, low-complexity, video signal processor (VSP [4,5,6].) This combination of ASIC and FPGA is flexible enough to implement four common compression algorithms, and powerful enough to execute them in real time.
IntroductionVideo compression is used in many applications to reduce the amount of information required to represent a sequence of images. There is a tremendous variety of applications for video compression ranging from highdefinition television, with a compressed data rate of several Mbits/sec., to low-power wireless video transmission at several tens of kbits/sec. Many compression algorithms are available depending on application requirements such as the amount of compression required, the required resiliency to errors, and the type of video source. This wide range of compression techniques makes programmable implementations especially attractive.Video processing typically requires high data throughput and computational complexity. For example, the discrete cosine transform (DCT) [7] is the basis of many compression systems. An efficient algorithm for computing the DCT on 8 x 8 blocks of a 15 frames / sec. video sequence with 256 x256 byte frames requires a 24MHz multiplier and a 55MHz adder. Since the DCT is usually followed by other algorithms such as run-length encoding (RLE) and Huffman coding, the DCT may require multiplications faster than 50MHz. These data rates are beyond those achievable by most DSP chips, and are challenging for FPGAs.Parallel banks of FPGAs and DSPs have been used to prototype some video processing routines successfully [1,2,3], but implementation on a single FPGA might lead to a more cost-effective system. In this paper, we describe two implementations utilizing a single FPGA. In the first system, the video compression algorithm is designed for a low-complexity implementation using a single in-system reprogrammable FPGA. Optimizing the algorithm to fit the system results in an efficient implementation, but the system is limited to the single algorithm. In a second implementation, the FPGA is augmented with an external processor. While the first system demonstrates techniques to reduce the algorithm complexity, the second system describes techniques to increase the computational power of the system.
FPGA Implementation Using LowComplexity Video AlgorithmsW...