Building extraction from high-resolution remote sensing images is of great importance for urban planning, disaster assessment, and geography mapping. In recent years, convolutional neural networks (CNNs) have made outstanding achievements in improving the precision of building extraction. However, most existing approaches have some problems, such as insufficient detailed feature extraction and ignorance of the relationship between different features. In this study, we propose a novel multi-channel recurrent attention network (MCANet) for building extraction. Firstly, the multi-scale channel attention mechanism (MS-CAM) is used to expand the convolution kernel receptive field, making the model can extract rich building region feature information. Secondly, we use the spatial pyramid recurrent block (SPR-Block) to establish long-range dependencies over space, channel, and layer of different convolutions. Finally, the multi-channel feature fusion block (MCFF-Block) is used to fuse the multi-scale channel features information, and improve the building extraction precision. Experimental results show that the proposed MCANet achieves better results (recall, precision, IoU, and F1_score on the IAILD dataset are 89.82, 94.38%, 87.42%, and 88.25%, respectively), and outperforms the other state-of-the-art (SOTA) approaches.