SummaryReal‐time estimation of crowd counting in underground metro systems, constrained by limited space, is crucial for managing heightened pedestrian volumes and responding promptly to emergencies. To address this challenge, we propose a passenger state transition‐based model, called STRmt, designed for the seamless and continuous monitoring of real‐time crowd movement within service areas of stations and trains, leveraging auto fare collection systems (AFC) as a comprehensive sensor network. Our innovation lies in modeling the dynamic movement of passengers within a metro system over time as a state transition process aligned with the train schedule. To achieve this, we introduce a spatio‐temporal deep learning framework, denoted as STnet, designed to dynamically predict these state transitions. The performance of our method is rigorously assessed through extensive experiments conducted spanning 2 years in Shenzhen, China, utilizing AFC data, train schedule data, and weather data. The results demonstrate that the proposed method surpasses baseline methods, achieving an estimation precision of 0.92.