Scene classification is a highly useful task in Remote Sensing (RS) applications. Many efforts have been made to improve the accuracy of RS scene classification. Scene classification is a challenging problem, especially for large datasets with tens of thousands of images with a large number of classes and taken under different circumstances. One problem that is observed in scene classification is the fact that for a given scene, only one part of it indicates which class it belongs to, whereas the other parts are either irrelevant or they actually tend to belong to another class. To address this issue, this paper proposes a deep attention Convolutional Neural Network (CNN) for scene classification in remote sensing. CNN models use successive convolutional layers to learn feature maps from larger and larger regions (or receptive fields) of the scene. The attention mechanism computes a new feature map as a weighted average of these original feature maps. In particular, we propose a solution, named EfficientNet-B3-Attn-2, based on the pre-trained EfficientNet-B3 CNN enhanced with an attention mechanism. A dedicated branch is added to layer 262 of the network, to compute the required weights. These weights are learned automatically by training the whole CNN model end-to-end using the backpropagation algorithm. In this way, the network learns to emphasize important regions of the scene and suppress the regions that are irrelevant to the classification. We tested the proposed EfficientNet-B3-Attn-2 on six popular remote sensing datasets, namely UC Merced, KSA, OPTIMAL-31, RSSCN7, WHU-RS19, and AID datasets, showing its strong capabilities in classifying RS scenes.