The ability of a large ensemble of 15 state-of-the-art regional climate models (RCMs) to simulate precipitation extremes was investigated. The 99th, 99.9th and 99.99th percentiles of daily precipitation in the models were compared with those in the recently released E-OBS observational database for winter, spring, summer and autumn. The majority of the models overestimated the values of the precipitation extremes compared with E-OBS, on average by approximately 38%, but some models exceeded 50%. To measure model performance, a simple metric is proposed that averages a nonlinear function of the seasonal biases over the European area. The sensitivity of the metric to different assumptions in the construction and the quality of the observational data was explored. Generally, low sensitivities of the metric to spatial and seasonal averaging were found. However, large sensitivities to potential biases in the observational database were found. An alternative metric that measures the spatial pattern of the extremes (which is not sensitive to a potential constant offset in the observational data) was further explored. With this metric, the ranking between the models changed substantially. However, the 2 models with the worst score in the standard metric also displayed the worst scores with this alternative metric. Finally, the regional climate models displayed the largest biases compared with E-OBS in areas where the underlying station density used in E-OBS is low, thus suggesting that data quality is indeed an important issue. In summary, the results show that: (1) there is no metric that guarantees an objective and precise ranking or weighting of the models, (2) by exploring different metrics it nevertheless appears possible to indentify models that perform consistently worse than other models, and (3) the observational data quality should be considered when designing and interpreting metrics.