Low-Density-Parity-Check (LDPC) codes have gained popularity in communication systems and standards due to their capacity-approaching error-correction performance. In this paper, we first expose the tradeoff between decoding performance and hardware performance across three LDPC hard-decision decoding algorithms: Gallager B (GaB), Gradient Descent Bit Flipping (GDBF), and Probabilistic Gradient Descent Bit Flipping (PGDBF). We show that GaB architecture delivers the best throughput while using fewest Field Programmable Gate Array (FPGA) resources, however performs the worst in terms of decoding performance. We then modify the GaB architecture, introduce a new Probabilistic stimulation function (PGaB), and achieve dramatic decoding performance improvement over the GaB, exceeding the performance of GDBF, without sacrificing its superior maximum operating frequency.