Fatigue among urban railway transit (URT) drivers affects their performance and is a contributing factor in many railway accidents and incidents. This paper attempts to develop a robust fatigue detection system for URT drivers. An experimental study was conducted in actual work conditions, involving 198 professional URT drivers, to provide authentic and representative data. Fatigue scores based on the Karolinska Sleepiness Scale were used as the ground truth, and heart rate variability (HRV) data were collected using wearable photoplethysmography (PPG) sensors under actual working conditions. An extensive statistical analysis found that continuous working hours were a major factor in driver fatigue. HRV features were able to differentiate various fatigue levels. Four classifiers (k-nearest neighbors, Naive Bayes, support vector machines, and random forests) were trained to detect fatigue in real time for binary and three-class fatigue classifications, respectively. For the binary classification, the best performance was achieved by the random forest classifier using the corrected feature set as input with an accuracy of 92.5%. However, the accuracy dropped by 8 to 27 percentage points for the three-class classification. Moreover, the research found that the corrected feature set circumventing inter-individual variability in HRV could improve the performance of fatigue classifiers. The findings from this research could contribute to developing a robust and real-time URT driver fatigue detection system and improve current URT operational safety regulations.