Accurate building extraction holds paramount importance in various applications such as urbanization rate calculations, urban planning, and resource allocation. In response to the escalating demand for precise low-altitude unmanned aerial vehicle (UAV) building segmentation in intricate scenarios, this study introduces a semi-supervised methodology to alleviate the labor-intensive process of procuring pixel-level annotations. Within the framework of adversarial networks, we employ a dual-channel parallel generator strategy that amalgamates the morphology-driven optical flow estimation channel with an enhanced multilayer sensing Deeplabv3+ module. This approach aims to comprehensively capture both the morphological attributes and textural intricacies of buildings while mitigating the dependency on annotated data. To further enhance the network’s capability to discern building features, we introduce an adaptive attention mechanism via a feature fusion module. Additionally, we implement a composite loss function to augment the model’s sensitivity to building structures. Across two distinct low-altitude UAV datasets within the domain of UAV-based building segmentation, our proposed method achieves average mean pixel intersection-over-union (mIoU) ratios of 82.69% and 79.37%, respectively, with unlabeled data constituting 70% of the overall dataset. These outcomes signify noteworthy advancements compared with contemporaneous networks, underscoring the robustness of our approach in tackling intricate building segmentation challenges in the domain of UAV-based architectural analysis.