The complex nature of agent-based modeling may reveal more descriptive accuracy than analytical tractability. That leads to an additional layer of methodological issues regarding empirical validation, which is an ongoing challenge. This paper offers a replicable method to empirically validate agent-based models, a specific indicator of “goodness-of-validation” and its statistical distribution, leading to a statistical test in some way comparable to the p value. The method involves an unsupervised machine learning algorithm hinging on cluster analysis. It clusters the ex-post behavior of real and artificial individuals to create meso-level behavioral patterns. By comparing the balanced composition of real and artificial agents among clusters, it produces a validation score in [0, 1] which can be judged thanks to its statistical distribution. In synthesis, it is argued that an agent-based model can be initialized at the micro-level, calibrated at the macro-level, and validated at the meso-level with the same data set. As a case study, we build and use a mobility mode-choice model by configuring an agent-based simulation platform called BedDeM. We cluster the choice behavior of real and artificial individuals with the same ex-ante given characteristics. We analyze these clusters’ similarity to understand whether the model-generated data contain observationally equivalent behavioral patterns as the real data. The model is validated with a specific score of 0.27, which is better than about 95% of all possible scores that the indicator can produce. By drawing lessons from this example, we provide advice for researchers to validate their models if they have access to micro-data.