Abstract. Simulation models are widely used in urban drainage engineering and research, but they are known to include errors and uncertainties that are not yet fully realised. Within the herein developed framework, we investigate model adequacy across multiple sites by comparing model results with measurements for three model objectives, namely surcharges (water level rises above defined critical levels related to basement flooding), overflows (water levels rise above a crest level), and everyday events (water levels stay below the top of pipes). We use
multi-event hydrological signatures, i.e. metrics that extract specific
characteristics of time series events in order to compare model results with the observations for the mentioned objectives through categorical and
statistical data analyses. Furthermore, we assess the events with respect to sufficient or insufficient categorical performance and good, acceptable, or poor statistical performance. We also develop a method to reduce the
weighting of individual events in the analyses, in order to acknowledge
uncertainty in model and/or measurements in cases where the model is not
expected to fully replicate the measurements. A case study including several years of water level measurements from 23 sites in two different areas shows that only few sites score a sufficient categorical performance in relation to the objective overflow and that sites do not necessarily
obtain good performance scores for all the analysed objectives. The
developed framework, however, highlights that it is possible to identify
objectives and sites for which the model is reliable, and we also suggest
methods for assessing where the model is less reliable and needs further
improvement, which may be further refined in the future.