To address the challenges of accurately segmenting irregular building boundaries in complex urban environments faced by existing remote sensing change detection methods, this paper proposes a building change detection network based on multilevel geometric representation optimization using frame fields called BuildingCDNet. The proposed method employs a multi-scale feature aggregation encoder–decoder architecture, leveraging contextual information to capture the characteristics of buildings of varying sizes in the imagery. Cross-attention mechanisms are incorporated to enhance the feature correlations between the change pairs. Additionally, the frame field is introduced into the network to model the complex geometric structure of the building target. By learning the local orientation information of the building structure, the frame field can effectively capture the geometric features of complex building features. During the training process, a multi-task learning strategy is used to align the predicted frame field with the real building outline, while learning the overall segmentation, edge outline, and corner point features of the building. This improves the accuracy of the building polygon representation. Furthermore, a discriminative loss function is constructed through multi-task learning to optimize the polygonal structured information of the building targets. The proposed method achieves state-of-the-art results on two commonly used datasets.