This study evaluates the spatiotemporal generalization of statistical and machine learning models to simulate built‐up land expansion and compare it to ensemble approaches. Integrated with cellular automata, six individual models—artificial neural networks, support vector machines (SVM), random forest (RF), boosted regression trees, the generalized additive model, the lasso, and two ensemble approaches called ensemble median and ensemble weighted area under the curve—were implemented. Each model was calibrated based on data from 1975–1990, and their extrapolation power was evaluated for 1990–1996, 1996–2000, 2000–2011, and 2011–2017. Total operating characteristics revealed that the RF model achieved the highest calibration accuracy and the highest performance loss during the validation period. The lowest calibration accuracy was related to the SVM model, yet its performance during the validation period increased. In the third time interval (1996–2002), the highest accuracy was again related to the SVM model. A sharp drop in simulation accuracy was seen in all models during the fourth (2002–2011) and fifth intervals (2011–2017). None of the ensemble models appeared to be superior to the individual models. Further, the accuracy of built‐up land expansion models drops noticeably for long‐term simulations.