We review the field of data assimilation (DA) from a Bayesian perspective and show that, in addition to its by now common application to state estimation, DA may be used for model selection. An important special case of the latter is the discrimination between a factual model -which corresponds, to the best of the modeler's knowledge, to the situation in the actual world in which a sequence of events has occurred -and a counterfactual model, in which a particular forcing or process might be absent or just quantitatively different from the actual world. Three different ensemble-DA methods are reviewed for this purpose: the ensemble Kalman filter (EnKF), the ensemble four-dimensional variational smoother (En-4D-Var), and the iterative ensemble Kalman smoother (IEnKS). An original contextual formulation of model evidence (CME) is introduced. It is shown how to apply these three methods to compute CME, using the approximated time-dependent probability distribution functions (pdfs) each of them provide in the process of state estimation. The theoretical formulae so derived are applied to two simplified nonlinear and chaotic models: (i) the Lorenz three-variable convection model (L63), and (ii) the Lorenz 40-variable mid-latitude atmospheric dynamics model (L95). The numerical results of these three DA-based methods and those of an integration based on importance sampling are compared. It is found that better CME estimates are obtained by using DA, and the IEnKS method appears to be best among the DA methods. Differences among the performance of the three DA-based methods are discussed as a function of model properties. Finally, the methodology is implemented for parameter estimation and for event attribution.