Face verification and recognition are important tasks that have made great progress in recent years. However, recognizing low-resolution faces from small images is still a difficult problem. In this paper, we advocate using diffusion models (DMs) to enhance face resolution and improve their quality for various downstream applications. Most existing DMs for super-resolution use U-Net as their backbone network, which only exploits multi-scale features along the spatial dimension. These approaches result in a slow convergence of corresponding DMs and the inability to capture complex details and fine textures. To address this issue, we propose a novel conditional generative model based on DMs called BPSR3, which replaces the U-Net in super-resolution via repeated refinement (SR3) with a multi-scale deep back-projection network structure. BPSR3 can extract richer features not only in depth but also in breadth. This helps to effectively refine the image quality at different scales. The experimental results on facial datasets show that BPSR3 significantly improved both convergence speed and reconstruction performance. BPSR3 has about 1/4 of the parameters of SR3 but achieves a 50.1% improvement in PSNR, a 19.8% improvement in SSIM, and a 15.4% reduction in FID. Our contribution lies in achieving less time and space consumption and better reconstruction results. In addition, we propose an idea of enhancing the performance of DMs by replacing the U-Net with a better network.