Abstract. The search for a long-term benchmark for land-surface models (LSM) has brought tree-ring data to the attention of the land-surface community as they record growth well before human-induced environmental changes became important. The most comprehensive archive of publicly shared tree-ring data is the International Tree-ring Data Bank (ITRDB). Many records in the ITRDB have, however, been collected almost exclusively with a view on maximizing an environmental target signal (e.g. climate), which has resulted in a biased representation of forested sites and landscapes and thus limits its use as a data source for benchmarking. The aim of this study is to propose advances in land-surface modelling and data processing to enable the land-surface community to re-use the ITRDB data as a much-needed century-long benchmark. Given that tree-ring width is largely explained by phenology, tree size, and climate sensitivity, LSMs that intend to use it as a benchmark should at least simulate tree phenology, size-dependent growth, differently-sized trees within a stand, and responses to changes in temperature, precipitation and atmospheric CO2 con¬cen¬tra¬tions. Yet, even if LSMs were capable of accurately simulating tree-ring width, sampling biases in the ITRDB need to be accounted for. This study proposes two solutions: exploiting the observation that the variation due to size-related growth by far exceeds the variation due to environmental changes; and simulating a size-structured population of trees. Combining the proposed advances in modelling and data processing resulted in four complementary benchmarks - reflecting different usage of the information contained in the ITRDB - each described by two metrics rooted in statistics that quantify the performance of the benchmark. Although the proposed benchmarks are unlikely to be precise, they advance the field by providing a much-needed large-scale constraint on changes in the simulated maximum tree diameter and annual growth increment for the transition from pre-industrial to present-day environmental conditions over the past century. Hence, the proposed benchmarks open up new ways of exploring the ITRDB archive, stimulate the dendrochronological community to refine its sampling protocols to produce new and spatially unbiased tree-ring networks, and help the modelling community to move beyond the short-term benchmarking of LSM.