Efficient Implementation of Givens QR Decomposition on VLIW DSP Architecture for Orthogonal Matching Pursuit Image Reconstruction

Modified Gram–Schmidt (MGS) algorithm is one of the most-known forms of QR decomposition (QRD) algorithms. It has been used in many signal and image processing applications to solve least square problem and linear equations or to invert matrices. However, QRD is well-thought-out as a computationally expensive technique, and its sequential implementation fails to meet the requirements of many real-time applications. In this paper, we suggest a new parallel version of MGS algorithm that uses VLIW (Very Long Instruction Word) resources in an efficient way to get more performance. The presented parallel MGS is based on compact VLIW kernels that have been designed for each algorithm step taking into account architectural and algorithmic constraints. Based on instruction scheduling and software pipelining techniques, the proposed kernels exploit efficiently data, instruction and loop levels parallelism. Additionally, cache memory properties were used efficiently to enhance parallel memory access and to avoid cache misses. The robustness, accuracy and rapidity of the introduced parallel MGS implementation on VLIW enhance significantly the performance of systems under severe rea-time and low power constraints. Experimental results show great improvements over the optimized vendor QRD implementation and the state of art.

show abstract

Novel parallel Givens QR decomposition implementation on VLIW architecture with Efficient memory access for real time image processing applications

Najoui¹,

Hatim

Belkouch³

2017

Proceedings of the 2nd International Conference on Big Data, Cloud and Applications

View full text Add to dashboard Cite

Efficient Implementation of Givens QR Decomposition on VLIW DSP Architecture for Orthogonal Matching Pursuit Image Reconstruction

Cited by 3 publications

References 11 publications

Optimized Implementation of Modified Gram Schmidt Algorithm on VLIW Architecture

Optimized Implementation of Modified Gram Schmidt Algorithm on VLIW Architecture

Novel Implementation Approach with Enhanced Memory Access Performance of MGS Algorithm for VLIW Architecture

Novel parallel Givens QR decomposition implementation on VLIW architecture with Efficient memory access for real time image processing applications

Contact Info

Product

Resources

About