As any model of real-world phenomena, soil erosion models must be tested against empirical evidence to have their performance evaluated. This is critical to develop knowledge and confidence in model predictions. However, evaluating soil erosion models is complicated due to the uncertainties involved in the estimation of model parameters and measurements of system responses. Here, we undertake a term co-occurrence analysis to investigate how model evaluation is approached in soil erosion research. The analysis illustrates how model testing is often neglected, and how model evaluation topics are segregated from current research interests. We perform a meta-analysis of model performance to understand the mechanisms that influence model predictive accuracy. Results indicate that different models do not systematically outperform each other, and that calibration seems to be the main mechanism of model improvement. We review how soil erosion models have been evaluated at different temporal and spatial scales, focusing on the methods, assumptions, and data used for model testing. We discuss the implications of uncertainty and equifinality in soil erosion models, and implement a case study of uncertainty assessment that enables models to be tested as hypotheses. A comment on the way forward for the evaluation of erosion models is presented, discussing philosophical aspects of hypothesis testing in environmental modelling. We refute the notion that soil erosion models can be validated, and emphasize the necessity of defining fit-for-purpose tests, based on multiple sources of data, that allow for a broad investigation of model usefulness and consistency.