Modified Gram–Schmidt (MGS) algorithm is one of the most-known forms of QR decomposition (QRD) algorithms. It has been used in many signal and image processing applications to solve least square problem and linear equations or to invert matrices. However, QRD is well-thought-out as a computationally expensive technique, and its sequential implementation fails to meet the requirements of many real-time applications. In this paper, we suggest a new parallel version of MGS algorithm that uses VLIW (Very Long Instruction Word) resources in an efficient way to get more performance. The presented parallel MGS is based on compact VLIW kernels that have been designed for each algorithm step taking into account architectural and algorithmic constraints. Based on instruction scheduling and software pipelining techniques, the proposed kernels exploit efficiently data, instruction and loop levels parallelism. Additionally, cache memory properties were used efficiently to enhance parallel memory access and to avoid cache misses. The robustness, accuracy and rapidity of the introduced parallel MGS implementation on VLIW enhance significantly the performance of systems under severe rea-time and low power constraints. Experimental results show great improvements over the optimized vendor QRD implementation and the state of art.