The prediction of trip generation is an essential problem for effective traffic engineering and urban management. Traditional methods are on the large spatial scale (e.g. Traffic analysis Zone, TAZ), based on the single source and fewer types data. It is difficult to carry out refined research on smaller spatial units, due to the high aggregation of personal trip survey data. In addition, the experience-based models cannot easily capture complex non-linear relationship, which leads to lower accuracy. Multi-sources data provides the possibility to improve the prediction accuracy of trip generation. Based on the point of interest data (POI), more disaggregate spatial unit are subdivided, and grid-scale spatial correlations are taken into consideration. This paper proposes a Convolutional Neural Network-Multidimensional Long-short term memory neural network (CNN-MDLSTM) model to analyze the spatial correlation between trip generation and land use features, capture prominent features in a spatial range through the convolution structure and describe the spatial interaction using the sequence transfer structure of Long-short term memory neural network (LSTM). The deformed Multi-Dimensional Long-Short Term Memory neural network (MDLSTM) is used to adapt to the two-dimensional spatial relationship. Through case analysis and comparative analysis between models, it is shown that CNN-MDLSTM characterizes the quantitative law of trip generation and land use features better than the other neural network models. In addition, this study also discusses the output prediction accuracy of different land use grid cells and the impact of different land use characteristics on the prediction accuracy.