QR decomposition (QRD) has been a vital component for various baseband processing algorithms, and is one potential bottleneck for next generation (5G) high-performance MIMO systems. To ulteriorly optimize the processing latency (PL) of QRD hardware architecture, this letter proposes a novel anticipated MGS (AMGS) algorithm based on conventional MGS algorithm. Anticipated computing is proposed in AMGS to diminish the PL. Moreover, Reciprocal square root (RSR) algorithm is designed to eliminate the complex operations (dividing and square root), making AMGS algorithm suit more for baseband processors. To evaluate the performance of the proposed AMGS algorithm, the corresponding triangular systolic array (TSA) hardware architecture is also implemented based on AMGS algorithm, whose working frequency is up to 417 MHz in 0.13 um CMOS technology to decompose a 4 × 4 real matrix in 31 clock cycles. The implementation results show that the PL performance is superior to other similar works of the literatures we know.