Remote sensing image scene classification faces challenges such as the difference in semantic granularity of different scene categories and the imbalance of the number of samples, which cause the wrong features learning for deep convolutional networks (DCNs). This paper proposes a multiple granularity semantic learning network (MGSN), including multiple granularity semantic learning (MGSL) and nonuniform sampling augmentation (NUA) modules. Specifically, the MGSL module makes full use of different granularities of semantic information of scenes, guiding the network to learn global and local features simultaneously. And the relationship between semantic features of different granularity has been explored, based that the learning of coarse-grained features helps to improve the learning of fine-grained semantic features. It shows that learning fine-grain semantics can inhibit learning coarse-grain semantic features. The NUA module combines sampling and sample augmentation to balance the sample distribution, which can avoid overfitting caused by oversampling. The proposed MGSN achieved state-ofthe-art classification accuracy on two large-scale remote sensing image scene classification datasets, Million-AID and NWPU-RESISC45. Under 10% and 20% training samples of NWPU-RESISC45 dataset, MGSN achieves 91.92% and 94.33% top1 accuracy respectively. In experiments conducted on the Million-AID dataset, the proposed MGSN performed best among 18 DCNs. In comparison to the baseline, FixEfficientNet, MGSN improved the accuracy of top1 and top5 by 10.63% and 5.47%, respectively, with low complexity costs.