In high-resolution remote sensing image retrieval (HRRSIR), convolutional neural networks (CNNs) have an absolute performance advantage over the traditional handcrafted features. However, some CNN-based HRRSIR models are classification-oriented, they pay no attention to similarity, which is critical to image retrieval; whereas others concentrate on learning similarity, failing to take full advantage of information about class labels. To address these issues, we propose a novel model called classification-similarity network (CSN), which aims for image classification and similarity prediction at the same time. In order to further improve performance, we build and train two CSNs, and two kinds of information from them, i.e., deep features and similarity scores, are consolidated to measure the final similarity between two images. Besides, the optimal fusion theorem in biometric authentication, which gives a theoretical scheme to make sure that fusion will definitely lead to a better performance, is used to conduct score fusion. Extensive experiments are carried out over publicly available datasets, demonstrating that CSNs are distinctly superior to usual CNNs and our proposed "two CSNs + feature fusion + score fusion" method outperforms the state-of-the-art models.