Building footprint maps are vital to many remote sensing (RS) applications, such as 3-D building modeling, urban planning, and disaster management. Due to the complexity of buildings, the accurate and reliable generation of the building footprint from RS imagery is still a challenging task. In this article, an end-to-end building footprint generation approach that integrates convolution neural network (CNN) and graph model is proposed. CNN serves as the feature extractor, while the graph model can take spatial correlation into consideration. Moreover, we propose to implement the feature pairwise conditional random field (FPCRF) as a graph model to preserve sharp boundaries and fine-grained segmentation. Experiments are conducted on four different data sets: 1) Planetscope satellite imagery of the cities of Munich, Paris, Rome, and Zurich; 2) ISPRS Benchmark data from the city of Potsdam; 3) Dstl Kaggle data set; and 4) Inria Aerial Image Labeling data of Austin, Chicago, Kitsap County, Western Tyrol, and Vienna. It is found that the proposed end-toend building footprint generation framework with the FPCRF as the graph model can further improve the accuracy of building footprint generation by using only CNN, which is the current state of the art.