Abstract. We evaluate four high-resolution model simulations of pollutant emissions, chemical transformation, and downwind transport for the Athabasca oil sands using the Global Environmental Multiscale -Modelling Air-quality and Chemistry (GEM-MACH) model, and compare model results with surface monitoring network and aircraft observations of multiple pollutants, for simulations spanning a time period corresponding to an aircraft measurement campaign in the summer of 2013. We have focussed here on the impact of different representations of the model's aerosol size distribution and plume-rise parameterization on model results.The use of a more finely resolved representation of the aerosol size distribution was found to have a significant impact on model performance, reducing the magnitude of the original surface PM 2.5 negative biases 32 %, from −2.62 to −1.72 µg m −3 .We compared model predictions of SO 2 , NO 2 , and speciated particulate matter concentrations from simulations employing the commonly used Briggs (1984) plume-rise algorithms to redistribute emissions from large stacks, with stack plume observations. As in our companion paper (Gordon et al., 2017), we found that Briggs algorithms based on estimates of atmospheric stability at the stack height resulted in under-predictions of plume rise, with 116 out of 176 test cases falling below the model : observation 1 : 2 line, 59 cases falling within a factor of 2 of the observed plume heights, and an average model plume height of 289 m compared to an average observed plume height of 822 m. We used a highresolution meteorological model to confirm the presence of significant horizontal heterogeneity in the local meteorological conditions driving plume rise. Using these simulated meteorological conditions at the stack locations, we found that a layered buoyancy approach for estimating plume rise in stable to neutral atmospheres, coupled with the assumption of free rise in convectively unstable atmospheres, resulted in much better model performance relative to observations (124 out of 176 cases falling within a factor of 2 of the observed plume height, with 69 of these cases above and 55 of these cases below the 1 : 1 line and within a factor of 2 of observed values). This is in contrast to our companion paper, wherein this layered approach (driven by meteorological observations not co-located with the stacks) showed a relatively modest impact on predicted plume heights. Persistent issues with over-fumigation of plumes in the model were linked to a more rapid decrease in simulated temperature with increasing height than was observed. This in turn may have led to overestimates of near-surface diffusivity, resulting in excessive fumigation.