Abstract. Accurate crime prediction plays an important role in public safety, providing technical guidance and decision support for the police and government departments. Due to the dynamics and imbalance of crime distribution, it is difficult to build predictive models for it. Specifically, the fine-grained and non-linear spatiotemporal dependencies of crime data cannot be captured accurately. In this paper, a neural network model ST-ACLCrime based on ConvLSTM and SE block was proposed to predict the number of theft crimes in hotspot areas. By overlaying ConvLSTM layers, fine-grained spatiotemporal dependencies are captured while preserving spatial location information. To further enhance the global channel feature representation, SE block is used to recalibrate the channel features and enhance the channel inter-dependencies. In addition, the closeness and the period components are set to dynamically capture the dependence of different time trends. We choose the city of Chicago as the study case, and use a multi-level spatial grid to divide the whole city area. The experimental results show that the proposed model exceeds all baseline model, such as HA, CNN, LSTM, CNN-LSTM and ConvLSTM. It was effectively capturing spatiotemporal dependence and improving prediction accuracy.