Abstract. The Canadian Land Surface Scheme Including Biogeochemical Cycles (CLASSIC) is an open source community model designed to address research questions that explore the role of the land surface in the global climate system. Here we evaluate how well CLASSIC reproduces the energy, water, and carbon cycle when forced with quasi-observed meteorological data. Model skill scores summarize how well model output agrees with observation-based reference data across multiple statistical metrics. A lack of agreement may be due to deficiencies in the model, its forcing data, and/or reference data. To address uncertainties in the forcing we evaluate an ensemble of CLASSIC runs that is based on three meteorological data sets. To account for observational uncertainty, we compute benchmark skill scores that quantify the level of agreement among independent reference data sets. The benchmark scores demonstrate what score values a model may realistically achieve given the uncertainties in the observations. Our results show that uncertainties associated with the forcing and observations are considerably large. For instance, for 10 out of 19 variables assessed in this study, the sign of the bias changes depending on what forcing and reference data are used. Benchmark scores are much lower than expected, implying large observational uncertainties. Model and benchmark score values are mostly similar, indicating that CLASSIC performs well when considering observational uncertainty. Using the difference between model and benchmark scores as a measure of performance shows that model skill increases in the following order: fractional area burned, runoff, soil heat flux, leaf area index, net shortwave radiation, net ecosystem exchange, above-ground biomass, gross primary productivity, surface albedo, snow water equivalent, net surface radiation, sensible heat flux, net longwave radiation, latent heat flux, and ecosystem respiration. Our results will serve as a baseline for guiding and monitoring future CLASSIC development.