Currently, under supervised learning, a model pre-trained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated knowledge transfer learning. Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pre-training in the remote sensing domain (RSD). Moreover, pre-training models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable incorrect labeling, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pre-training and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive Pre-Training (CSPT) is proposed based on the idea of not stopping pre-training in natural language processing (NLP), which can gradually bridge the domain gap and transfer large-scale data knowledge to any specific domain (e.g., from nature scene domain to RSD) In addition, the proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments were carried out on twelve remote sensing datasets involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and synthetic aperture radar (SAR)). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in the RSD can outperform the previous knowledge transfer learning strategies based on model pre-training without any expensive manually labeling and even surpass the state-of-the-art (SOTA) performance without any careful network architecture designing.