In cosmology, we routinely choose between models to describe our data, and can incur biases due to insufficient models or lose constraining power with overly complex models. In this paper we propose an empirical approach to model selection that explicitly balances parameter bias against model complexity. Our method uses synthetic data to calibrate the relation between bias and the χ 2 difference between models. This allows us to interpret χ 2 values obtained from real data (even if catalogues are blinded) and choose a model accordingly. We apply our method to the problem of intrinsic alignments -one of the most significant weak lensing systematics, and a major contributor to the error budget in modern lensing surveys. Specifically, we consider the example of the Dark Energy Survey Year 3 (DES Y3), and compare the commonly used nonlinear alignment (NLA) and tidal alignment & tidal torque (TATT) models. The models are calibrated against bias in the Ω m − S 8 plane. Once noise is accounted for, we find that it is possible to set a threshold ∆χ 2 that guarantees an analysis using NLA is unbiased at some specified level N σ and confidence level. By contrast, we find that theoretically defined thresholds (based on, e.g., p−values for χ 2 ) tend to be overly optimistic, and do not reliably rule out cosmological biases up to ∼ 1 − 2σ. Considering the real DES Y3 cosmic shear results, based on the reported difference in χ 2 from NLA and TATT analyses, we find a roughly 30% chance that were NLA to be the fiducial model, the results would be biased (in the Ω m − S 8 plane) by more than 0.3σ. More broadly, the method we propose here is simple and general, and requires a relatively low level of resources. We foresee applications to future analyses as a model selection tool in many contexts.