Spatiotemporally resolved urban fossil fuel CO2 (FFCO2) emissions are critical to urban carbon cycle research and urban climate policy. Two general scientific approaches have been taken to estimate spatiotemporally explicit urban FFCO2 fluxes, referred to here as “downscaling” and “bottom‐up.” Bottom‐up approaches can specifically characterize the CO2‐emitting infrastructure in cities but are labor‐intensive to build and currently available in few U.S. cities. Downscaling approaches, often available globally, require proxy information to allocate or distribute emissions resulting in additional uncertainty. We present a comparison of a downscaled FFCO2 emission data product (Open‐source Data Inventory for Anthropogenic CO2 (ODIAC)) to a bottom‐up estimate (Hestia) in four U.S. urban areas in an effort to better isolate and understand differences between the approaches. We find whole‐city differences ranging from −1.5% (Los Angeles Basin) to +20.8% (Salt Lake City). At the 1 km × 1 km spatial scale, comparisons reveal a low‐emission limit in ODIAC driven by saturation of the nighttime light spatial proxy. At this resolution, the median difference between the two approaches ranged from 47 to 84% depending upon city with correlations ranging from 0.34 to 0.68. The largest discrepancies were found for large point sources and the on‐road sector, suggesting that downscaled FFCO2 data products could be improved by incorporating independent large point‐source estimates and estimating on‐road sources with a relevant spatial surrogate. Progressively coarsening the spatial resolution improves agreement but greater than approximately 25 km2, there were diminishing returns to agreement suggesting a practical resolution when using downscaled approaches.