Zero-shot remote sensing scene classification refers to the classification of new images from unseen scene classes and has become a topic of growing interest in the field of remote sensing. Semantic autoencoders are one of the mainstream zeroshot learning methods. However, such autoencoders may not be discriminative enough for remote sensing scene images due to the high within-class diversity and between-class similarity. To address this issue, we propose a distance-constrained semantic autoencoder (DSAE) to deal with zero-shot remote sensing scene classification. More specifically, we learn a semantic autoencoder for seen scene classes, which allows to align the visual space and the semantic space. In order to improve the discriminative ability of this semantic autoencoder, we impose a discriminative distance metric constraint, aiming to minimize the Euclidean distances between the encoded vectors of samples of the same class and maximize the Euclidean distances between the encoded vectors of samples of different classes. Additionally, we learn a semantic autoencoder for unseen scene classes, aiming to alleviate the domain shift problem. Extensive experiments on three benchmark remote sensing scene datasets demonstrate the superiority of the proposed method over state-of-the-art methods.