Aimed at the challenge of low accuracy of building segmentation caused by poor continuity of remote-sensing-image regions and blurred boundaries, a remote sensing building semantics segmentation algorithm based on multi-scale regional consistent attention supervision is proposed. Firstly, based on the Unet encoder-decoder architecture, the proposed algorithm constructs the region attention network (ReA-Net), which employs a multi-scale receptive field-guidance model to simultaneously focus on regional features and edge details of remote sensing image objects. Secondly, the self-attention mechanism is employed to establish the correlation representation of regional-level features of remote sensing images, and multi-scale regional attention features of remote sensing images are obtained through weighted regional-level correlation mapping. Finally, to address the lack of spatial correlation constraints on the prediction of remote sensing images segmentation, a loss function with multi-scale neighborhood consistency supervision is suggested to constrain the consistency of pixel label assignment related to a local region. Experimental results on WHU Building Dataset showed that Intersection over Union (IOU) reached 91.6%, precision reached 95.61%, recall reached 95.68% recall and F1-score reached 95.64%; On the Massachusetts building dataset, IOU reached 74.77% and precision reached 83.93%, recall reached 87.53% and F1-score reached 85.69%. Therefore, the proposed algorithm not only has a good segmentation effect but also has a strong robustness for remote sensing building image segmentation.