Semantic segmentation techniques for remote sensing images (RSIs) have been widely developed and applied. However, most segmentation methods depend on sufficiently annotated data for specific scenarios. When a large change occurs in the target scenes, model performance drops significantly. Therefore, unsupervised domain adaptation (UDA) for semantic segmentation is proposed to alleviate the reliance on expensive per-pixel densely labeled data. In this paper, two key issues of existing domain adaptive (DA) methods are considered: (1) the factors that cause data distribution shifts in RSIs may be complex and diverse, and existing DA approaches cannot adaptively optimize for different domain discrepancy scenarios; (2) domain-invariant feature alignment, based on adversarial training (AT), is prone to excessive feature perturbation, leading to over robust models. To address these issues, we propose an AdvCDA method that guides the model to adapt adversarial perturbation consistency. We combine consistency regularization to consider interdomain feature alignment as perturbation information in the feature space, and thus propose a joint AT and self-training (ST) DA method to further promote the generalization performance of the model. Additionally, we propose a confidence estimation mechanism that determines network stream training weights so that the model can adaptively adjust the optimization direction. Extensive experiments have been conducted on Potsdam, Vaihingen, and LoveDA remote sensing datasets, and the results demonstrate that the proposed method can significantly improve the UDA performance in various cross-domain scenarios.