The ability of current global models to simulate the transport of CO 2 by mid-latitude, synopticscale weather systems (i.e., CO 2 weather) is important for inverse estimates of regional and global carbon budgets but remains unclear without comparisons to targeted measurements. Here, we evaluate ten models that participated in the Orbiting Carbon Observatory-2 model intercomparison project (OCO-2 MIP version 9) with intensive aircraft measurements collected from the Atmospheric Carbon Transport (ACT)-America mission. We quantify model-data differences in the spatial variability of CO 2 mole fractions, mean winds, and boundary layer depths in 27 mid-latitude cyclones spanning four seasons over the central and eastern United States. We find that the OCO-2 MIP models are able to simulate observed CO 2 frontal differences with varying degrees of success in summer and spring, and most underestimate frontal differences in winter and autumn. The models may underestimate the observed boundary layer-to-free troposphere CO 2 differences in spring and autumn due to model errors in boundary layer height. Attribution of the causes of model biases in other seasons remains elusive. Transport errors, prior fluxes, and/or inversion algorithms appear to be the primary cause of these biases since model performance is not highly sensitive to the CO 2 data used in the inversion. The metrics presented here provide new benchmarks regarding the ability of atmospheric inversion systems to reproduce the CO 2 structure of mid-latitude weather systems. Controlled experiments are needed to link these metrics more directly to the accuracy of regional or global flux estimates.Plain Language Summary Global flux estimate systems use CO 2 observations, atmospheric transport models, CO 2 flux models (emissions and absorption), and mathematical optimization methods to estimate biosphere-atmosphere CO 2 exchange. Accurate representation of atmospheric transport is important for a reliable optimization of fluxes in these systems. We use intensive aircraft measurements of wind speed, boundary layer height, and horizontal and vertical differences of CO 2 concentrations within 27 mid-latitude cyclones collected by the Atmospheric Carbon Transport (ACT)-America mission to evaluate the performance of ten global flux estimate systems from the Orbiting Carbon Observatory-2 model intercomparison project (OCO-2 MIP). We find the models can simulate observed horizontal CO 2 differences between the warm and cold parts of cyclones with different degrees of success in summer and spring, but often underestimate the observed cross-frontal and vertical differences in CO 2 in winter and autumn. The models may underestimate the CO 2 differences between the boundary layer and the free troposphere due to model errors in boundary layer height and surface fluxes. These weather-oriented CO 2 metrics provide benchmarks for testing simulations of the CO 2 structure within cyclones. Future efforts are needed to link these metrics more directly to the accuracy of CO 2 flux esti...