In JPEG2000 block coding, all coding passes are generated before rate allocation is performed among code-blocks. Unwanted passes are then discarded. For low bit-rate coding, this results in discarding of a large number of coding passes. In this letter, we propose a rate-distortion estimation method that enables pre-compression rate-distortion optimization to be carried out, wherein only the required passes need to be coded. Experiments using the proposed technique demonstrate speedup factors ranging from 1.17 to 1.78 at 0.0625 bpp, for JPEG2000 compression. Introduction: JPEG2000 is a current standard for image compression. An important feature of JPEG2000 is state-of-the-art low bit-rate performance in terms of decoded image quality [1]. Requirement of efficient low bit rate implementation for JPEG2000 compression is therefore of paramount importance. Block coding in JPEG2000 is based on Embedded Block Coding with Optimized Truncation (EBCOT) [2]. EBCOT operates on small blocks of quantized subband data called code-blocks, in terms of fractional bit-planes called coding passes, to generate coded sub-bit-streams. EBCOT achieves compression by discarding less important coding passes from each sub-bitstream such that the distortion is minimized while the target rate is met. At low
Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.