Abstract. The detection of meteorological, chemical, or other signals in modeled or observed air quality data -such as an estimate of a temporal trend in surface ozone data, or an estimate of the mean ozone of a particular region during a particular season -is a critical component of modern atmospheric chemistry. However, the magnitude of a surface air quality signal is generally small compared to the magnitude of the underlying chemical, meteorological, and climatological variabilities (and their interactions) that exist both in space and in time, and which include variability in emissions and surface processes. This can present difficulties for both policymakers and researchers as they attempt to identify the influence or signal of climate trends (e.g., any pauses in warming trends), the impact of enacted emission reductions policies (e.g., United States NO x State Implementation Plans), or an estimate of the mean state of highly variable data (e.g., summertime ozone over the northeastern United States). Here we examine the scale dependence of the variability of simulated and observed surface ozone data within the United States and the likelihood that a particular choice of temporal or spatial averaging scales produce a misleading estimate of a particular ozone signal. Our main objective is to develop strategies that reduce the likelihood of overconfidence in simulated ozone estimates. We find that while increasing the extent of both temporal and spatial averaging can enhance signal detection capabilities by reducing the noise from variability, a strategic combination of particular temporal and spatial averaging scales can maximize signal detection capabilities over much of the continental US. For signals that are large compared to the meteorological variability (e.g., strong emissions reductions), shorter averaging periods and smaller spatial averaging regions may be sufficient, but for many signals that are smaller than or comparable in magnitude to the underlying meteorological variability, we recommend temporal averaging of 10-15 years combined with some level of spatial averaging (up to several hundred kilometers). If this level of averaging is not practical (e.g., the signal being examined is at a local scale), we recommend some exploration of the spatial and temporal variability to provide context and confidence in the robustness of the rePublished by Copernicus Publications on behalf of the European Geosciences Union.