Probabilistic forecasting models describe the aleatory variability of natural systems as well as our epistemic uncertainty about how the systems work. Testing a model against observations exposes ontological errors in the representation of a system and its uncertainties. We clarify several conceptual issues regarding the testing of probabilistic forecasting models for ontological errors: the ambiguity of the aleatory/epistemic dichotomy, the quantification of uncertainties as degrees of belief, the interplay between Bayesian and frequentist methods, and the scientific pathway for capturing predictability. We show that testability of the ontological null hypothesis derives from an experimental concept, external to the model, that identifies collections of data, observed and not yet observed, that are judged to be exchangeable when conditioned on a set of explanatory variables. These conditional exchangeability judgments specify observations with well-defined frequencies. Any model predicting these behaviors can thus be tested for ontological error by frequentist methods; e.g., using P values. In the forecasting problem, prior predictive model checking, rather than posterior predictive checking, is desirable because it provides more severe tests. We illustrate experimental concepts using examples from probabilistic seismic hazard analysis. Severe testing of a model under an appropriate set of experimental concepts is the key to model validation, in which we seek to know whether a model replicates the data-generating process well enough to be sufficiently reliable for some useful purpose, such as long-term seismic forecasting. Pessimistic views of system predictability fail to recognize the power of this methodology in separating predictable behaviors from those that are not.system science | Bayesian statistics | significance testing | subjective probability | expert opinion S cience is rooted in the concept that a model can be tested against observations and rejected when necessary (1). However, the problem of model testing becomes formidable when we consider natural systems. Owing to their scale, complexity, and openness to interactions within a larger environment, most natural systems cannot be replicated in the laboratory, and direct observations of their inner workings are always inadequate. These difficulties raise serious questions about the meaning and feasibility of "model verification" and "model validation" (2), and have led to the pessimistic view that "the outcome of natural processes in general cannot be accurately predicted by mathematical models" (3).Uncertainties in the formal representation of natural systems imply that the forecasting of emergent phenomena such as natural hazards must be based on probabilistic rather than deterministic modeling. The ontological framework for most probabilistic forecasting models comprises two types of uncertainty: an aleatory variability that describes the randomness of the system, and an epistemic uncertainty that characterizes our lack of knowledge about the sy...