An orthogonal 16-point approximate discrete cosine transform (DCT) is introduced. The proposed transform requires neither multiplications nor bit-shifting operations. A fast algorithm based on matrix factorization is introduced, requiring only 44 additions-the lowest arithmetic cost in literature. To assess the introduced transform, computational complexity, similarity with the exact DCT, and coding performance measures are computed. Classical and state-of-the-art 16-point low-complexity transforms were used in a comparative analysis. In the context of image compression, the proposed approximation was evaluated via PSNR and SSIM measurements, attaining the best cost-benefit ratio among the competitors. For video encoding, the proposed approximation was embedded into a HEVC reference software for direct comparison with the original HEVC standard. Physically realized and tested using FPGA hardware, the proposed transform showed 35% and 37% improvements of area-time and area-time-squared VLSI metrics when compared to the best competing transform in the literature.
IntroductionThe discrete cosine transform (DCT) [1, 2] is a fundamental building-block for several image and video processing applications. In fact, the DCT closely approximates the Karhunen-Loève transform (KLT) [1], which is capable of optimal data decorrelation and energy compaction of first-order stationary Markov signals [1].This class of signals is particularly appropriate for the modeling of natural images [1,3]. Thus, the DCT finds applications in several contemporary image and video compression standards, such as the JPEG [4] and the H.26x family of codecs [5][6][7]. Indeed, several fast algorithms for computing the exact DCT were proposed [8][9][10][11][12][13][14][15]. However, these methods require the use of arithmetic multipliers [16,17], which are time, power, and hardware demanding arithmetic operations, when compared to additions or bit-shifting operations [18].