A hardware architecture for real-time image compression using a searchless fractal image coding method

Jackson, David Jeff; Ren, Haichen; Wu, Xianwei; Ricks, Kenneth G.

doi:10.1007/s11554-007-0024-2

Cited by 18 publications

(18 citation statements)

References 15 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From Table 4, it is found that the proposed architecture outperforms the architecture by Vidya [18], in terms of PSNR, encoding time and compression ratio for a larger image size, while both (the proposed and in [18]) architectures employ the same algorithm and the same partitioning scheme. Compared to the architecture by Jackson [8], the saving in execution time can be ascribed to the absence of domain search mechanism, as it employs no-search algorithm. This design requires 1047 clock cycles for encoding a 32 × 32 range block.…”

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

“…The domain block position is fixed with respect to range block which again eliminates the requirement of storing the best matching domain index. Therefore, in [8] the compression ratio improvement is basically due to the no-search algorithm. The achieved CR is fixed (i.e., 5.12 and 22.26 for 4 × 4 and 8 × 8, respectively) in the proposed work for any preferred range block size, whereas for [8] it varies nearly from 3 to 28 due to the QT structure.…”

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

“…Therefore, in [8] the compression ratio improvement is basically due to the no-search algorithm. The achieved CR is fixed (i.e., 5.12 and 22.26 for 4 × 4 and 8 × 8, respectively) in the proposed work for any preferred range block size, whereas for [8] it varies nearly from 3 to 28 due to the QT structure. Owing to adaptive threshold, PSNR variation is not only a function of range block size, but also dependent on the attained threshold in that level.…”

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

“…Selection of square error metric requires a squarer block which includes a multiplier thereby not only increasing the chip area but also increasing the complexity of the design with little gain in PSNR. Jackson [8] presented a full quad-tree searchless method based on the algorithm specified in [7]. This architecture tactfully utilizes the hardware resources to accommodate quad-tree level with range block sizes extending from 32 × 32 to a single pixel.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Low-Delay Parallel Architecture for Fractal Image Compression

Panigrahy

Chakrabarti

Dhar

2015

Circuits Syst Signal Process

View full text Add to dashboard Cite

This paper presents an efficient hardware architecture for implementing fractal image compression (FIC) algorithm aimed toward image compression with improved encoding speed. The proposed architecture follows the full-search-based FIC scheme. Parallel processing has been effectively used in the present work to achieve the goal of reducing the time complexity of the encoder. This architecture requires a total of 2n + 2 clock cycles for executing the set of operations consisting of fetching the pixels, calculating the mean of range and domain blocks and doing their mapping, computing the error, and storing the fractal parameter in a memory with n number of pixels in the range block. Further, this architecture does not make use of any preprocessing operations as specified in literature and utilizes the benefits of isometric transformation without requiring additional cycles for every single matching operation. Effective application of isometric transformation has also led to memory reduction of nearly 67 %. Again, in the present work, the use of multipliers has been avoided to save the chip area, to reduce hardware complexity, and to enhance the encoding speed. The operation of transforming contracted domain block with a zero-mean domain block has facilitated relatively fast convergence at the decoder. PSNR above 30dB for a range block of size 4 × 4 has been achieved by the proposed architecture, which is comparable to that realizable by other architectures. The proposed design has been coded in Verilog HDL, has been implemented in Xilinx Virtex-5 FPGA, and operates at a clock frequency of 75.52 MHz.

show abstract

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

Section: Experimental Results and Comparison Of Performancesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Low-Delay Parallel Architecture for Fractal Image Compression

Panigrahy

Chakrabarti

Dhar

2015

Circuits Syst Signal Process

View full text Add to dashboard Cite

show abstract

“…In [17], the memory accesses of the application are simplified by ignoring interframe movements. On the other hand, ignoring inter-frame movement will lead to error blocks with greater magnitude, and therefore lower achievable compression ratios.…”

Section: The Qsdpcm Applicationmentioning

confidence: 99%

Custom parallel caching schemes for hardware-accelerated image compression

Ang

Constantinides

Luk

et al. 2008

J Real-Time Image Proc

View full text Add to dashboard Cite

In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on Field Programmable Gate Arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data is stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data [18]. These data dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting two dimensional spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the Quad-tree Structured Pulse Code Modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms,

show abstract