Side-scan sonar (SSS) is an essential acoustic sensor device for obtaining underwater information. The instance segmentation of sonar images can effectively locate and detect underwater objects. Although various CNN-based frameworks have achieved promising results in natural image instance segmentation, the noise interference, highlight shadow, and blurred edge in sonar images bring more significant challenges for sonar image instance segmentation. To solve these problems, we propose a novel recurrent pyramid frequency feature fusion network (RPFNet), which mainly consists of the pyramid frequency feature fusion network (PFN), recurrent residual attention mechanism (RRAM), mask prediction module, and semantic segmentation module. By enhancing and fusing different frequency features of SSS images, PFN can effectively extract fine-grained features and reduce background information interference. The RRAM uses residual structure and attention mechanism to enhance the correlation of different frequency features and improve nonlinear feature representation ability. The mask prediction and semantic segmentation modules are used to discriminate object instance categories and generate corresponding semantic segmentation results. We extensively evaluate the proposed framework on the real-scenes sonar image dataset. The evaluation results demonstrate that the proposed method outperforms the state-ofthe-art methods in various evaluation metrics. Code is available at https://github.com/darkseid-arch/SonarRPFNet.