Abstract. Ice sheet models are the main tool to generate forecasts of ice sheet mass loss; a significant contributor to sea-level rise, thus knowing the likelihood of such projections is of critical societal importance. However, to capture the complete range of possible projections of mass loss, ice sheet models need efficient methods to quantify the forecast uncertainty. Uncertainties originate from the model structure, from the climate and ocean forcing used to run the model and from model calibration. Here we quantify the latter, applying an error propagation framework to a realistic setting in West Antarctica. As in many other ice-sheet modelling studies we use a control method to calibrate grid-scale flow parameters (parameters describing the basal drag and ice stiffness) with remotely-sensed observations. Yet our framework augments the control method with a Hessian-based Bayesian approach that estimates the posterior covariance of the inverted parameters. This enables us to quantify the impact of the calibration uncertainty on forecasts of sea-level rise contribution or volume above flotation (VAF), due to the choice of different regularisation strengths (prior strengths), sliding laws and velocity inputs. We find that by choosing different satellite ice velocity products our model leads to different estimates of VAF after 40 years. We use this difference in model output to quantify the variance that projections of VAF are expected to have after 40 years and identify prior strengths that can reproduce that variability. We demonstrate that if we use prior strengths suggested by L-curve analysis, as is typically done in ice-sheet calibration studies, our uncertainty quantification is not able to reproduce that same variability. The regularisation suggested by the L-curves is too strong and thus propagating the observational error through to VAF uncertainties under this choice of prior leads to errors that are smaller than those suggested by our 2-member “sample” of observed velocity fields. Additionally, our experiments suggest that large amounts of data points may be redundant, with implications for the error propagation of VAF.