Due to the loss of phase information in images captured by intensity-only measurements, the numerical reconstruction of inline digital holographic imaging suffers from the undesirable twin-image artifact. This artifact presents as an out-of-focus conjugate at the virtual image plane and reduces the reconstruction quality. In this work, we propose a diffusion-based generative model that eliminates such defocus noise in single-shot inline digital holography. The diffusion-based generative model learns the implicit prior of the underlying data distribution by progressively injecting random noise in data and then generating high-quality samples by reversing this process. Although the diffusion model has been successful in various challenging tasks in computer vision, its potential in scientific imaging has not been fully explored yet, and one challenge is the inherent randomness in its reverse sampling process. To address this issue, we incorporate the underlying physics of image formation as a prior, which constrains the possible samples from the data distribution. Specifically, we include an extra gradient correction step in each reverse sampling process to introduce data consistency and generate better results. We demonstrate the feasibility of our approach using simulated and experimental holograms and compare our results with previous methods. Our model recovers detailed object information and significantly suppresses the twin-image noise. The proposed method is explainable, generalizable, and transferable to other samples from various distributions, making it a promising tool for digital holographic reconstruction.