Ultra-reliable low-latency communications, URLLC, are designed for applications such as self-driving cars and telesurgery requiring a response in milliseconds and are very sensitive to transmission errors. To match the computational complexity of LDPC decoding algorithms to URLLC applications on IoT devices having very limited computational resources, this paper presents a new parallel and low-latency software implementation of the LDPC decoder. First, a decoding algorithm optimization and a compact data structure are proposed. Next, a parallel software implementation is performed on ARM multicore platforms in order to evaluate the latency of the proposed optimization. The synthesis results highlight a reduction in the memory size requirement by 50% and a three-time speedup in terms of processing time when compared to previous software decoder implementations. The reached decoding latency on the parallel processing platform is 150 μs for 288 bits with a bit error ratio of 3.410–9.