An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8×8 2-D DCT which is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on (i) optimized Dempster-Macleod multipliers and (ii) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on FPGA chip. Six designs, for 4-and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented and measured. The maximum clock rate and block-rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8×307.787≈2.462 GHz if eventually embedded in a real-time video-processing system. The equivalent frame rate is about 1187.35 Hz for the image size of 1920×1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device.
Multi-beamforming is an important requirement for broadband space imaging applications based on dense aperture arrays (AAs). Usually, the discrete Fourier transform is the transform of choice for AA electromagnetic imaging. Here, the discrete cosine transform (DCT) is proposed as an alternative, enabling the use of emerging fast algorithms that offer greatly reduced complexity in digital arithmetic circuits. We propose two novel high-speed digital architectures for recently proposed fast algorithms (Bouguezel, Ahmad and Swamy 2008 Electron. Lett. 44 1249–50) (BAS-2008) and (Cintra and Bayer 2011 IEEE Signal Process. Lett. 18 579–82) (CB-2011) that provide good approximations to the DCT at zero multiplicative complexity. Further, we propose a novel DCT approximation having zero multiplicative complexity that is shown to be better for multi-beamforming AAs when compared to BAS-2008 and CB-2011. The far-field array pattern of ideal DCT, BAS-2008, CB-2011 and proposed approximation are investigated with error analysis. Extensive hardware realizations, implementation details and performance metrics are provided for synchronous field programmable gate array (FPGA) technology from Xilinx. The resource consumption and speed metrics of BAS-2008, CB-2011 and the proposed approximation are investigated as functions of system word size. The 8-bit versions are mapped to emerging asynchronous FPGAs leading to significantly increased real-time throughput with clock rates at up to 925.6 MHz implying the fastest DCT approximations using reconfigurable logic devices in the literature.
An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8×8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the 4-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of 8-point 1-D DCT cores from 5 down to 2. The architecture offers exact computation of 8×8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time 2 complexity metrics are also reduced by 23% and 22% respectively for designs with 8-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608·10 9 pixels/seconds and a 8×8 block rate of 118.875 MHz.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.