Mapping population distribution at fine spatial scales is significant and fundamental to solve resource utilization, assessment of city disaster, environmental regulation, and urbanization. Multi-source data produced by remote and social sensing have been widely used to disaggregate census information to map population distributions at fine resolution. However, it is challenging to achieve accurate high spatial resolution population mapping by combining multi-source data and considering geographic spatial heterogeneity. The existing approaches do not consider global and local spatial information simultaneously, resulting in low accuracy. This paper proposes a multi-model fusion neural network for estimating fine-resolution population estimates from multi-source data. Our approach takes into account the local spatial information and global information of each geographic unit. Specifically, a first-order space matrix of a geographic unit is used to characterize its local spatial information. We proposed a multi-model neural network, which combines a convolutional neural network (CNN) and a multilayer perceptron (MLP) model to estimate a fine-resolution population mapping. Using Shenzhen, China, as the experimental setting, a population distribution map was generated at 100m spatial resolution. The model was quantitatively validated by showing that it captured the relationship between estimated population and census population at township level (R 2 = 0.77) more accurately than the WorldPop dataset (R 2 = 0.51) and the MLPbased model (R 2 = 0.63). Qualitatively, the proposed model can identify differences in population density in densely populated areas and some remote population clusters more accurately than the WorldPop populcation dataset.