Compared to single aperture systems, optical synthetic aperture systems greatly improve the spatial resolution, yet still exhibit a certain degree of blurring and contrast reduction. To address this challenge, numerous image restoration methods have been proposed. Recently, instead of the conventional circular synthetic aperture, the rotating rectangular synthetic aperture (RRSA) system employs a rectangular aperture to capture a sequence of images of the same scene. The RRSA system’s foldable design and absence of common-phase adjustments confer cost and complexity benefits. The captured degraded image sequences contain information about multiple directions of the target scene, so it is necessary to use multi-frame image fusion technology to restore them. However, most conventional methods often introduce visual artifacts and require substantial computational time. In this paper, we propose a Dual-Domain Fusion Network (DDFNet), restoring multi-frame degraded images in the spatial and frequency domain and then achieving superior fusion results. DDFNet employs a nested U-Net architecture to capture local pixel-level relationships, facilitating the recovery of local features and structures from spatial domain images. In parallel, we transform the input images into the frequency domain, and utilize another nested U-Net for feature extraction on the normalized spectrum and phase, thereby improving the recovery of texture and edge information. Finally, the fusion model effectively utilizes multi-level features and contextual awareness to combine the spatial and frequency domain features, achieving high-quality fusion results of captured degraded sequence images. Extensive experiments demonstrate that our method achieves superior performance both in quantitative and qualitative assessments compared to state-of-the-art techniques.