Remote sensing scene classification converts remote sensing images into classification information to support high-level applications, so it is a fundamental problem in the field of remote sensing. In recent years, many convolutional neural network (CNN)-based methods have achieved impressive results in remote sensing scene classification, but they have two problems in extracting remote sensing scene features: (1) fixed-shape convolutional kernels cannot effectively extract features from remote sensing scenes with complex shapes and diverse distributions; (2) the features extracted by CNN contain a large number of redundant and invalid information. To solve these problems, this paper constructs a deformable convolutional neural network to adapt the convolutional sampling positions to the shape of objects in the remote sensing scene. Meanwhile, the spatial and channel attention mechanisms are used to focus on the effective features while suppressing the invalid ones. The experimental results indicate that the proposed method is competitive to the state-of-the-art methods on three remote sensing scene classification datasets (UCM, NWPU, and AID).