Layered decoding (LD) of Low-Density Parity-Check (LDPC) codes is a decoding schedule that facilitates partially parallel architectures for performing Belief Propagation (BP)-based iterative algorithms. It has reduced implementation complexity and memory overhead compared to fully parallel architectures and higher convergence speed compared to both serial and parallel architectures. In this paper, we introduce a modified form of shuffling of the Parity-Check Matrices (PCMs) of Quasi-Cyclic LDPC (QC-LDPC) codes, which is basically an interleaving operation of the rows of the PCM. The modified shuffling method just like the conventional shuffling method results in a PCM in which each layer can be produced by the circulation of its above layer one symbol to the right. However, it additionally guarantees the weights of the columns in each layer to be either zero or one. Then, we show that due to these two properties, the number of occupied Look-Up Tables (LUTs) on a Field Programmable Gate Array (FPGA) is reduced by about 93% and consumed on-chip power by nearly 80%. Nevertheless, shuffling doesn’t degrade Bit Error Rate (BER) performance compared with the non-shuffled case. Additionally, decoding throughput is not sacrificed for low SNR values and its degradation is negligible until the BER of 1e-6.