[1] Forecasts of ozone (O 3 ) and particulate matter (diameter less than 2.5 mm, PM 2.5 ) from seven air quality forecast models (AQFMs) are statistically evaluated against observations collected during August and September of 2006 (49 days) through the Aerometric Information Retrieval Now (AIRNow) network throughout eastern Texas and adjoining states. Ensemble O 3 and PM 2.5 forecasts created by combining the seven separate forecasts with equal weighting, and simple bias-corrected forecasts, are also evaluated in terms of standard statistical measures, threshold statistics, and variance analysis. For O 3 the models and ensemble generally show statistical skill relative to persistence for the entire region, but fail to predict high-O 3 events in the Houston region. For PM 2.5 , none of the models, or ensemble, shows statistical skill, and all but one model have significant low bias. Comprehensive comparisons with the full suite of chemical and aerosol measurements collected aboard the NOAA WP-3 aircraft during the summer 2006 Second Texas Air Quality Study and the Gulf of Mexico Atmospheric Composition and Climate Study (TexAQS II/GoMACCS) field study are performed to help diagnose sources of model bias at the surface. Aircraft flights specifically designed for sampling of Houston and Dallas urban plumes are used to determine model and observed upwind or background biases, and downwind excess concentrations that are used to infer relative emission rates. Relative emissions from the U.S. Environmental Protection Agency 1999 National Emission Inventory (NEI-99) version 3 emissions inventory (used in two of the model forecasts) are evaluated on the basis of comparisons between observed and model concentration difference ratios. Model comparisons demonstrate that concentration difference ratios yield a reasonably accurate measure (within 25%) of relative input emissions. Boundary layer height and wind data are combined with the observed up-wind and downwind concentration differences to estimate absolute emissions. When the NEI-99 inventory is modified to include observed NO y emissions from continuous monitors and expected NO x decreases from mobile sources between 1999 and 2006, good agreement is found with those derived from the observations for both Houston and Dallas. However, the emission inventories consistently overpredict the ratio of CO to NO y . The ratios of ethylene and aromatics to NO y are reasonably consistent with observations over Dallas, but are significantly underpredicted for Houston.