<div>Satellite
image time series (SITS)
classification is a major research topic in remote sensing and is relevant for
a wide range of applications. Deep learning approaches have been commonly employed
for SITS classification and have provided state-of-the-art performance. However,
deep learning methods suffer from overfitting when labeled data is scarce. To address
this problem, we propose a novel self-supervised pre-training scheme to initialize
a Transformer-based network by utilizing large-scale unlabeled data. In detail,
the model is asked to predict randomly contaminated observations given an
entire time series of a pixel. The main idea of our proposal is to leverage the
inherent temporal structure of satellite time series to learn general-purpose
spectral-temporal representations related to land cover semantics. Once
pre-training is completed, the pre-trained network can be further adapted to various
SITS classification tasks by fine-tuning all the model parameters on small-scale
task-related labeled data. In this way, the general knowledge and
representations about SITS can be transferred to a label-scarce task, thereby
improving the generalization performance of the model as well as reducing the
risk of overfitting. Comprehensive experiments have been carried out on three
benchmark datasets over large study areas. Experimental results demonstrate the
effectiveness of the proposed method, leading to a classification accuracy
increment up to 2.38% to 5.27%. The code and the pre-trained model will be
available at https://github.com/linlei1214/SITS-BERT upon publication.</div><div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>