Abstract. The search for a long-term benchmark for land-surface models (LSMs) has brought tree-ring data to the attention of the land-surface modelling community, as tree-ring data have recorded growth well before human-induced environmental changes became important. We propose and evaluate an improved conceptual framework of when and how tree-ring data may, despite their sampling biases, be used as century-long hindcasting targets for evaluating LSMs. Four complementary benchmarks – size-related diameter growth, diameter increment of mature trees, diameter increment of young trees, and the response of tree growth to extreme events – were simulated using the ORCHIDEE version r5698 LSM and were verified against observations from 11 sites in the independent, unbiased European biomass network datasets. The potential for big-tree selection bias in the International Tree-Ring Data Bank (ITRDB) was investigated by subsampling the 11 sites from European biomass network. We find that in about 95 % of the test cases, using ITRDB data would result in the same conclusions as using the European biomass network when the LSM is benchmarked against the annual radial growth during extreme climate years. The ITRDB data can be used with 70 % confidence when benchmarked against the annual radial growth of mature trees or the size-related trend in annual radial growth. Care should be taken when using the ITRDB data to benchmark the annual radial growth of young trees, as only 50 % of the test cases were consistent with the results from the European biomass network. The proposed maximum tree diameter and annual growth increment benchmarks may enable the use of ITRDB data for large-scale validation of the LSM-simulated response of forest ecosystems to the transition from pre-industrial to present-day environmental conditions over the past century. The results also suggest ways in which tree-ring width observations may be collected and/or reprocessed to provide long-term validation tests for land-surface models.