Land use/cover change (LUCC) refers to the phenomenon of changes in the Earth’s surface over time. Accurate prediction of LUCC is crucial for guiding policy formulation and resource management, contributing to the sustainable use of land, and maintaining the health of the Earth’s ecosystems. LUCC is a dynamic geographical process involving complex spatiotemporal dependencies. Existing LUCC simulation models suffer from insufficient spatiotemporal feature learning, and traditional cellular automaton (CA) models exhibit limitations in neighborhood effects. This study proposes a cellular automaton model based on spatiotemporal feature learning and hotspot area pre-allocation (VST-PCA). The model utilizes the video swin transformer to acquire transformation rules, enabling a more accurate capture of the spatiotemporal dependencies inherent in LUCC. Simultaneously, a pre-allocation strategy is introduced in the CA simulation to address the local constraints of neighborhood effects, thereby enhancing the simulation accuracy. Using the Chongqing metropolitan area as the study area, two traditional CA models and two deep learning-based CA models were constructed to validate the performance of the VST-PCA model. Results indicated that the proposed VST-PCA model achieved Kappa and FOM values of 0.8654 and 0.4534, respectively. Compared to other models, Kappa increased by 0.0322–0.1036, and FOM increased by 0.0513–0.1649. This study provides an accurate and effective method for LUCC simulation, offering valuable insights for future research and land management planning.