<p><strong>Abstract.</strong> Atmospheric reanalyses are data-assimilating weather models which are widely used as proxies for the true state of the atmosphere in the recent past, particularly for the stratosphere, where historical observations are sparse. But how realistic are these stratospheric reanalyses? Here, we resample stratospheric temperature data from six modern reanalyses (CFSR, ERA-5, ERA-Interim, JRA-55, JRA-55C and MERRA-2) to produce synthetic satellite observations, which we directly compare to retrieved temperatures from the COSMIC, HIRDLS and SABER instruments and to brightness temperatures from the AIRS instrument for the ten-year period 2003&#8211;2012. We explicitly sample standard public-release products in order to best assess their suitability for typical use cases. We find that all-time all-latitude correlations between limb sounder observations and synthetic observations from full-input reanalyses are 0.97&#8211;0.99 at 30&#8201;km altitude, falling to 0.84&#8211;0.94 at 50&#8201;km. The highest correlations are seen at high latitudes and the lowest in the sub-tropics, but root-mean-square (RMS) differences are highest (10&#8201;K or greater) in high-latitude winter. At all latitudes, differences increase with increasing height. High-altitude differences become especially large during disrupted periods such as the post-sudden stratospheric warming recovery phase, where zonal-mean differences can be as high as 18&#8201;K between different datasets. We further show that, for the current generation of reanalysis products, a full-3D sampling approach is always required to produce realistic synthetic AIRS observations, but is almost never required to produce realistic synthetic HIRDLS observations. For synthetic SABER and COSMIC observations full-3D sampling is required in equatorial regions and regions of high gravity-wave activity but not otherwise. Finally, we use cluster-analyses to show that full-input reanalyses are more tightly correlated with each other than with observations, even observations which they assimilate. This may suggest that these reanalyses are over-tuned to match their comparators. If so, this could have significant implications for future reanalysis development.</p>