The self-attention mechanism can break the limitation of the receptive field, model in a global scope, and extract global information efficiently. In this work, we propose a lightweight remote sensing building change detection model (ESACD). In the encoder, we use the enhanced self-attention layer, CoT layer, instead of the normal convolution operation. The CoT layer fuses the dynamic context with the static context. Compared with the ordinary convolutional layer, this strategy can fully mine the local features between the input keys to dynamically enhance the feature representation. Subsequently, we use dual attention to further mine the low-frequency information and high-frequency information of the images and the semantic features of interest to the model. Dual attention consists of the HiLo attention mechanism and the Tokenizer attention mechanism. HiLo extracts high-frequency information and low-frequency information through two branches. In the high-frequency branch, nonoverlapping windows are applied to the features for self-attention. In the low-frequency branch, average pooling is first performed on features before self-attention. After Tokenizer attention extracts the feature tokens that the model is interested in, it encodes its information and, then, converts the tokens into pixel-level features. Tokenizer attention realizes the efficient extraction of features and enhances the representation ability of the model. Finally, we fuse feature information to enhance the fluidity of information and improve accuracy. Through our experiments on two change detection datasets, ESACD has better performance than other state-of-the-art methods while maintaining fewer parameters.