Progress in high‐resolution numerical weather prediction (NWP) for urban areas will require new modelling approaches and extensive evaluation. Here, we exploit land surface temperature (LST) data from Landsat‐8 to assess 100 m resolution NWP for London (UK) on four cloud‐free days. The LST observations are directional radiometric temperatures with non‐negligible uncertainties. We consider the challenges of informative comparison between the Landsat LST and the NWP scheme's internal characterisation of the complete surface temperature. The LST spatial coverage allows large‐scale observation–model differences to be explored. In one case, obvious spatial artifacts in the NWP surface temperature are observed relative to the Landsat LST. These are found to be related to the NWP's initial method of downscaling of soil moisture using soil properties. Updated model runs have higher spatial correlation between model and Landsat LST. In cases where meteorological conditions favour the formation of horizontal convective rolls, warmer air temperatures associated with updraughts in the mixed layer extend inappropriately to the urban surface. This manifests as warm stripes in the model surface temperature that are not present in the Landsat LST. NWP–Landsat LST differences are larger in more built‐up areas on days nearer summer solstice. This is largely attributed to urban thermal anisotropy, as Landsat preferentially views warmer urban surfaces, whereas the model LST represents all surfaces. We evaluate two approaches to quantify this sampling effect, but further work is needed to fully constrain it and facilitate more informative model evaluation.