Abstract. Comparisons with ground-based correlative measurements constitute a key component in the validation of satellite data on atmospheric composition. The error budget of these comparisons contains not only the measurement errors but also several terms related to differences in sampling and smoothing of the inhomogeneous and variable atmospheric field. A versatile system for Observing System Simulation Experiments (OSSEs), named OSSSMOSE, is used here to quantify these terms. Based on the application of pragmatic observation operators onto high-resolution atmospheric fields, it allows a simulation of each individual measurement, and consequently, also of the differences to be expected from spatial and temporal field variations between both measurements making up a comparison pair. As a topical case study, the system is used to evaluate the error budget of total ozone column (TOC) comparisons between GOME-type direct fitting (GODFITv3) satellite retrievals from GOME/ERS2, SCIAMACHY/Envisat, and GOME-2/MetOp-A, and ground-based direct-sun and zenith-sky reference measurements such as those from Dobsons, Brewers, and zenith-scattered light (ZSL-)DOAS instruments, respectively. In particular, the focus is placed on the GODFITv3 reprocessed GOME-2A data record vs. the ground-based instruments contributing to the Network for the Detection of Atmospheric Composition Change (NDACC). The simulations are found to reproduce the actual measurements almost to within the measurement uncertainties, confirming that the OSSE approach and its technical implementation are appropriate. This work reveals that many features of the comparison spread and median difference can be understood as due to metrological differences, even when using strict colocation criteria. In particular, sampling difference errors exceed measurement uncertainties regularly at most mid-and high-latitude stations, with values up to 10 % and more in extreme cases. Smoothing difference errors only play a role in the comparisons with ZSL-DOAS instruments at high latitudes, especially in the presence of a polar vortex due to the strong TOC gradient it induces. At tropical latitudes, where TOC variability is lower, both types of errors remain below about 1 % and consequently do not contribute significantly to the comparison error budget. The detailed analysis of the comparison results, including the metrological errors, suggests that the published random measurement uncertainties for GODFITv3 reprocessed satellite data are potentially overestimated, and adjustments are proposed here. This successful application of the OSSSMOSE system to close for the first time the error budget of TOC comparisons, bodes well for potential future applications, which are briefly touched upon.