Air pollution and associated human exposure are important research areas in Greater Sydney, Australia. Several field campaigns were conducted to characterize the pollution sources and their impacts on ambient air quality including the Sydney Particle Study Stages 1 and 2 (SPS1 and SPS2), and the Measurements of Urban, Marine, and Biogenic Air (MUMBA). In this work, the Weather Research and Forecasting model with chemistry (WRF/Chem) and the coupled WRF/Chem with the Regional Ocean Model System (ROMS) (WRF/Chem-ROMS) are applied during these field campaigns to assess the models’ capability in reproducing atmospheric observations. The model simulations are performed over quadruple-nested domains at grid resolutions of 81-, 27-, 9-, and 3-km over Australia, an area in southeastern Australia, an area in New South Wales, and the Greater Sydney area, respectively. A comprehensive model evaluation is conducted using surface observations from these field campaigns, satellite retrievals, and other data. This paper evaluates the performance of WRF/Chem-ROMS and its sensitivity to spatial grid resolutions. The model generally performs well at 3-, 9-, and 27-km resolutions for sea-surface temperature and boundary layer meteorology in terms of performance statistics, seasonality, and daily variation. Moderate biases occur for temperature at 2-m and wind speed at 10-m in the mornings and evenings due to the inaccurate representation of the nocturnal boundary layer and surface heat fluxes. Larger underpredictions occur for total precipitation due to the limitations of the cloud microphysics scheme or cumulus parameterization. The model performs well at 3-, 9-, and 27-km resolutions for surface O3 in terms of statistics, spatial distributions, and diurnal and daily variations. The model underpredicts PM2.5 and PM10 during SPS1 and MUMBA but overpredicts PM2.5 and underpredicts PM10 during SPS2. These biases are attributed to inaccurate meteorology, precursor emissions, insufficient SO2 conversion to sulfate, inadequate dispersion at finer grid resolutions, and underprediction in secondary organic aerosol. The model gives moderate biases for net shortwave radiation and cloud condensation nuclei but large biases for other radiative and cloud variables. The performance of aerosol optical depth and latent/sensible heat flux varies for different simulation periods. Among all variables evaluated, wind speed at 10-m, precipitation, surface concentrations of CO, NO, NO2, SO2, O3, PM2.5, and PM10, aerosol optical depth, cloud optical thickness, cloud condensation nuclei, and column NO2 show moderate-to-strong sensitivity to spatial grid resolutions. The use of finer grid resolutions (3- or 9-km) can generally improve the performance for those variables. While the performance for most of these variables is consistent with that over the U.S. and East Asia, several differences along with future work are identified to pinpoint reasons for such differences.