As part of the evaluation of IGRF-11 candidate models, we compared candidate models and actual measurements. We first carried out a residual analysis between main field candidates and CHAMP data, which were pre-processed and corrected for the secular variation and the lithospheric, external and oceanic fields. For epoch 2005.0, one model (D) is abnormally far from the testing dataset, while four models (A, B, F, G) have the smallest data residuals. For 2010.0, three models (B, F, G) have smaller data residuals than other models. These results, although biased toward models relying on datasets close to the testing datasets (B, F), usefully complement the results of intercomparisons between models. We next tested secular variation candidate models for 2010-2015 against annual differences of (a) definitive monthly means in 2007 and 2008 at 86 observatories, and (b) quasidefinitive monthly means from January to October 2009 at nine observatories where this new type of data was produced. Quasi-definitive data are found to significantly improve the discriminating effect of the test, favoring models obtained at epochs close to the end of 2009 (B, F) and penalizing some extrapolated models (G). They also enable a truly independent validation of the candidate models.