Accurate road maps are fundamental to a wide range of applications, such as navigation, transportation and urban planning. With the rapid development of Globe Position System (GPS) and Remote Sensing (RS), abundant and continuously updated spatiotemporal data are available for road map building. However, using single data source inevitably results in limited performance on road information recognition, due to inherent defects in data, such as the noise, low sampling rate and uneven density distribution in trajectories, and occluded roads in RS-images. This paper aims to integrate GPS trajectories and RS-images to build road map, which is composed of intersections and links. First, both dynamic and static trajectory characteristics are utilized to capture road features indicated in trajectories, and meanwhile, road features implied in by RS-images are extracted via transfer-learning with an advanced deep neural network. Then, the two distinct types of road features are explored together to learn the potential area of road centerlines based on a variation of U-Net, which refines the U-Net to incorporate the structure characters of road. Finally, a multistep refinement is introduced to extract road centerlines through morphology processing, then derive intersections through connectivity analysis and detect links based on sliding-window. In addition, inspired by vector rasterization, a fast and automatic strategy is developed to create large-scale road dataset to significantly reduce labor and time. Compared with three state-of-the-art segment networks, the proposed network achieves the highest correctness and quality. The experimental results demonstrate that trajectory data and RS image complement each other, which assure the extracted road network possesses both integrity and connectivity. INDEX TERMS Road map generation, crowdsourcing trajectory data, remote sensing image, data fusion.