International audienceApproximation of discrete cosine transform (DCT) is useful for reducing its computational complexity without significant impact on its coding performance. Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths, and some of them are non-orthogonal. This paper presents a generalized recursive algorithm to obtain orthogonal approximation of DCT where an approximate DCT of length N could be derived from a pair of DCTs of length (N/2) at the cost of N additions for input preprocessing. We perform recursive sparse matrix decomposition and make use of the symmetries of DCT basis vectors for deriving the proposed approximation algorithm. Proposed algorithm is highly scalable for hardware as well as software implementation of DCT of higher lengths, and it can make use of the existing approximation of 8-point DCT to obtain approximate DCT of any power of two length, N > 8. We demonstrate that the proposed approximation of DCT provides comparable or better image and video compression performance than the existing approximation methods. It is shown that proposed algorithm involves lower arithmetic complexity compared with the other existing approximation algorithms. We have presented a fully scalable reconfigurable parallel architecture for the computation of approximate DCT based on the proposed algorithm. One uniquely interesting feature of the proposed design is that it could be configured for the computation of a 32-point DCT or for parallel computation of two 16-point DCTs or four 8-point DCTs with a marginal control overhead. The proposed architecture is found to offer many advantages in terms of hardware complexity, regularity and modularity. Experimental results obtained from FPGA implementation show the advantage of the proposed method
We have suggested a new data-access scheme for the computation of lifting two-dimensional (2-D) discrete wavelet transform (DWT) without using data transposition. We have derived a linear systolic array directly from the dependence graph (DG) and a 2-D systolic array from a suitably segmented DG for parallel and pipeline implementation of 1-D DWT. These two systolic arrays are used as building blocks to derive the proposed transposition-free structure for lifting 2-D DWT. The proposed structure requires only a small on-chip memory of (4N + 8P ) words and processes a block of P samples in every cycle, where N is the image width. Moreover, it has small output latency of nine cycles and does not require control signals which are commonly used in most of the existing DWT structures. Compared with the best of the existing high-throughput structures, the proposed structure requires the same arithmetic resources but involves 1.5N less on-chip memory and offers the same throughput rate. ASIC synthesis result shows that the proposed structure for block size 8 and image size 512 × 512 involves 28% less area, 35% less area-delay product, and 27% less energy per image than the best of the corresponding existing structures. Apart from that, the proposed structure is regular and modular; and it can be easily configured for different block sizes.Index Terms-Block processing, discrete wavelet transform (DWT), lifting, systolic VLSI, 2-D DWT.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.