One of the challenges in the field of remote sensing is how to automatically identify and classify high-resolution remote sensing images. A number of approaches have been proposed. Among them, the methods based on low-level visual features and middle-level visual features have limitations. Therefore, this paper adopts the method of deep learning to classify scenes of high-resolution remote sensing images to learn semantic information. Most of the existing methods of convolutional neural networks are based on the existing model using transfer learning, while there are relatively few articles about designing of new convolutional neural networks based on the existing high-resolution remote sensing image datasets. In this context, this paper proposes a multi-view scaling strategy, a new convolutional neural network based on residual blocks and fusing strategy of pooling layer maps, and uses optimization methods to make the convolutional neural network named RFPNet more robust. Experiments on two benchmark remote sensing image datasets have been conducted. On the UC Merced dataset, the test accuracy, precision, recall, and F1-score all exceed 93%. On the SIRI-WHU dataset, the test accuracy, precision, recall, and F1-score all exceed 91%. Compared with the existing methods, such as the most traditional methods and some deep learning methods for scene classification of high-resolution remote sensing images, the proposed method has higher accuracy and robustness.