Face Super-resolution (FSR) models encounter a significant challenge related to extremely low-dimensional (16×16 pixels) and degraded input images. This deficiency in crucial facial details within the low-level and intermediate levels of the FSR model presents obstacles in tasks such as face alignment, landmark detection, and consequently, difficulty in recovering high-frequency details, resulting in unfaithful and unrealistic super-resolved face images. This research proposes an innovative FSR model with strategically designed multi-attention techniques to enhance facial attribute recovery capabilities. The model incorporates a Non-local Module (NL) and residual pixel attention technique at the low-level stage of the FSR model. Simultaneously, a Spatial Feature Transfer (SFT) module refines mid-level features by leveraging spatial information through an iterative interaction process between an attentive module and a landmark estimation network. By strategically utilizing these modules under an iterative collaboration framework, our method effectively addresses challenges in facial detail recovery, demonstrating enhanced model understanding and refined representation. The proposed model is rigorously examined on CelebA, Helen, AFLW2000, and WFLW datasets at scale factors of ×8 and ×16. The results consistently demonstrate the superiority of our proposed Multi-Stage Refining Face Super-Resolution (MSRFSR) model over state-ofthe-art methods through extensive quantitative and qualitative experiments on four datasets and both scales INDEX TERMS Face image super-resolution, non-local attention, residual pixel attention, spatial feature transfer.