In recent years, many convolutional neural network (CNN)-based methods have been proposed to address the scene classification tasks of remote sensing images. Since the number of training samples in RS datasets is generally small, data augmentation is often used to expand the training set. It is, however, not appropriate when original data augmentation methods keep the label and change the content of the image at the same time. In this study, label augmentation (LA) is presented to fully utilize the training set by assigning a joint label to each generated image, which considers the label and data augmentation at the same time. Moreover, the output of images obtained by different data augmentation is aggregated in the test process. However, the augmented samples increase the intra-class diversity of the training set, which is a challenge to complete the following classification process. To address the above issue and further improve classification accuracy, Kullback–Leibler divergence (KL) is used to constrain the output distribution of two training samples with the same scene category to generate a consistent output distribution. Extensive experiments were conducted on widely-used UCM, AID and NWPU datasets. The proposed method can surpass the other state-of-the-art methods in terms of classification accuracy. For example, on the challenging NWPU dataset, competitive overall accuracy (i.e., 91.05%) is obtained with a 10% training ratio.