Convolutional Neural Networks (CNNs) are frequently utilized in content-based remote sensing image retrieval (CBRSIR). However, the features extracted by CNNs are not rotationally invariant, which is problematic for remote sensing (RS) images where objects appear at variable rotation angles. Additionally, because remote sensing images contain a wealth of content and detail information, CNNs may lead to information loss by superimposing multiple convolutional and pooling layers, affecting the ability of the model to extract features. To address these problems, this paper propose a proxybased feature fusion network (PBFFN). By designing a proxybased Euclidean distance contrast loss that combines contrast learning within the framework of metric learning, such that the distance between the source image and its rotated image embedding vector in the metric space is closer than any other image, thus endowing the model with a certain degree of rotation invariant. Meanwhile, the global correlation map is generated by multi-layer fusion, under whose guidance the features of each layer are fused to improve the feature extraction capability of the model and to reduce the loss in the image flow process. Extensive experiments based on two public remote sensing datasets show that the method achieves better performance compared to other methods.