Recently, many deep learning-based methods have been developed for solving remote sensing (RS) scene classification or retrieval tasks. Most of the adopted loss functions for training these models require accurate annotations. However, the presence of noise in such annotations (also known as label noise) cannot be avoided in large-scale RS benchmark archives, resulting from geo-location/registration errors, land-cover changes, and diverse knowledge background of annotators. To overcome the influence of noisy labels on the learning process of deep models, we propose a new loss function called noise-tolerant deep neighborhood embedding which can accurately encode the semantic relationships among RS scenes. Specifically, we target at maximizing the leave-one-out K-NN score for uncovering the inherent neighborhood structure among the images in feature space. Moreover, we down-weight the contribution of potential noisy images by learning their localized structure and pruning the images with low leaveone-out K-NN scores. Based on our newly proposed loss function, classwise features can be more robustly discriminated. Our experiments, conducted on two benchmark RS datasets, validate the effectiveness of the proposed approach on three different RS scene interpretation tasks, including classification, clustering, and retrieval. The codes of this article will be publicly available from https://github.com/jiankang1991.