Smart card data from the Automatic Fare Collecting systems (AFC) and timetable information, such as Automatic Vehicle Location (AVL), are used in combination by practitioners and researchers to gain a deeper understanding of the public transit network. In some cases, AVL data are not available due to records being missing in the system. In such cases, people resort to the used schedule timetable such as General Transit Feed Specification (GTFS) to match smart card data to the transit network. Since delays or changes to the timetable are not contained in the scheduled timetable, it can result in wrong matches between the smart card data and the transit network. This paper shows how the uncertainty of arrival and departure times affects passengers to train assignments and proposes a method for estimating the missing arrival time of trains when the recorded timetable information is not available. The method uses the knowledge of how the tap-outs are distributed in a hierarchical, latent Bayesian model to predict the arrival times of trains. Evaluated on 15,136 train arrivals, the model can infer 70% of the arrivals times with an average error of 28 to 32 seconds depending on the station.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.