Simulation models represent soil organic carbon (SOC) dynamics in global carbon (C) cycle scenarios to support climate‐change studies. It is imperative to increase confidence in long‐term predictions of SOC dynamics by reducing the uncertainty in model estimates. We evaluated SOC simulated from an ensemble of 26 process‐based C models by comparing simulations to experimental data from seven long‐term bare‐fallow (vegetation‐free) plots at six sites: Denmark (two sites), France, Russia, Sweden and the United Kingdom. The decay of SOC in these plots has been monitored for decades since the last inputs of plant material, providing the opportunity to test decomposition without the continuous input of new organic material. The models were run independently over multi‐year simulation periods (from 28 to 80 years) in a blind test with no calibration (Bln) and with the following three calibration scenarios, each providing different levels of information and/or allowing different levels of model fitting: (a) calibrating decomposition parameters separately at each experimental site (Spe); (b) using a generic, knowledge‐based, parameterization applicable in the Central European region (Gen); and (c) using a combination of both (a) and (b) strategies (Mix). We addressed uncertainties from different modelling approaches with or without spin‐up initialization of SOC. Changes in the multi‐model median (MMM) of SOC were used as descriptors of the ensemble performance. On average across sites, Gen proved adequate in describing changes in SOC, with MMM equal to average SOC (and standard deviation) of 39.2 (±15.5) Mg C/ha compared to the observed mean of 36.0 (±19.7) Mg C/ha (last observed year), indicating sufficiently reliable SOC estimates. Moving to Mix (37.5 ± 16.7 Mg C/ha) and Spe (36.8 ± 19.8 Mg C/ha) provided only marginal gains in accuracy, but modellers would need to apply more knowledge and a greater calibration effort than in Gen, thereby limiting the wider applicability of models.