Deep learning has shown superiority in change detection (CD) tasks, notably the Transformer architecture with its self-attention mechanism, capturing long-range dependencies and outperforming traditional models. This capability provides the Transformer with significant advantages in capturing global-level features of complex changes in objects within high-resolution remote sensing images. Though Transformers are mature in Natural Language Processing (NLP), their application in computer vision, particularly CD tasks, is nascent. Current research on leveraging Transformers for CD reveals limitations, especially under varied lighting and seasonal changes. To address this, we propose VisionTwinNet, a twostage strategy. First, our Gated EnhanceClearNet, a specially designed deep network reduces image noise and enhances brightness, preserving shadows and correcting color distortions. With its unique gating mechanism, this network can adaptively adjust the importance of features, thereby exhibiting superior performance in various remote sensing image degradation issues. Secondly, we have developed Hybrid Light-Robust CDNet, a hybrid robust lightweight network custom-designed for CD in remote sensing images. This module deeply integrates the advantages of CNN and Transformer and introduces an innovative attention mechanism design, optimizing the key/value dimensions separately, instead of adopting traditional single linear transformations, ensuring efficient detection. Specifically, the LR-Transformer Block employs a lightweight multi-head self-attention mechanism, optimizing computational efficiency while providing richer feature representations. Comparative studies with six CD methods on three public datasets validate VisionTwinNet's robustness and efficacy. Our approach notably reduces algorithmic complexity and enhances the efficiency of the model.
INDEX TERMSAutomatically adjustable framework, change detection, deep learning, multi-scale feature extraction, transformer.