In this study we analyzed the performance of 12 state-of-the-art global climate models (GCMs) from 2 different model generations used in the ENSEMBLES project (a European Commissionfunded climate-change research project) over southwestern Europe. For this purpose, we assessed the similarity of the simulated and quasi-observed (reanalysis) probability density functions for circulation, temperature, and humidity variables at various pressure levels, which we chose from a statisticaldownscaling point of view. Our main goals were to assess which GCM variables can be reliably used as predictors for downscaling, and which GCMs perform especially well over the region under study. Results showed that specific humidity is as reliably reproduced as circulation and temperature variables, and that overall performance is best for the Hadley Centre's HADGEM2 model. Secondary goals were to estimate the skillful scale of the models, and to measure the added value of bias correction, a post-processing step commonly used in practice. We found that all models lack performance at the scale of individual grid boxes, indicating that they are not robustly skillful at their smallest scale. We also found that model performance generally improves after removing monthly bias. However, model errors at higher-order moments, which cannot be removed by simply correcting the bias, were common in some models.