The extended min-sum (EMS) and improved EMS (I-EMS) algorithms for non-binary low-density parity-check codes over GF(q) significantly reduce the decoding complexity with an acceptable performance degradation, but they suffer from high latency because of many serial computations, including a sorting process. On the other hand, the trellis-based EMS algorithm can greatly reduce the latency, but it does not solve the complexity problem in high-order fields (q ≥ 64). To improve the latency problem with low-complexity advantages, the authors propose heap-based EMS (H-EMS) and heap-based I-EMS (HI-EMS) algorithms that are modifications of the EMS and I-EMS algorithms, respectively. The authors also propose double H-EMS and double HI-EMS algorithms trading off the latency against the performance by heaping messages twice. Numerical results show that the H-EMS algorithm has 2.74-9.52 times lower latency than the EMS algorithm with a negligible performance degradation over a wide range of code rates, whereas the HI-EMS algorithm has 1.20-1.62 times lower latency than the I-EMS algorithm. Furthermore, the proposed algorithms may be employed regardless of the decoding schedules.