Machine learning classifiers are being increasingly used nowadays for Land Use and Land Cover (LULC) mapping from remote sensing images. However, arriving at the right choice of classifier requires understanding the main factors influencing their performance. The present study investigated firstly the effect of training sampling design on the classification results obtained by Random Forest (RF) classifier and, secondly, it compared its performance with other machine learning classifiers for LULC mapping using multi-temporal satellite remote sensing data and the Google Earth Engine (GEE) platform. We evaluated the impact of three sampling methods, namely Stratified Equal Random Sampling (SRS(Eq)), Stratified Proportional Random Sampling (SRS(Prop)), and Stratified Systematic Sampling (SSS) upon the classification results obtained by the RF trained LULC model. Our results showed that the SRS(Prop) method favors major classes while achieving good overall accuracy. The SRS(Eq) method provides good class-level accuracies, even for minority classes, whereas the SSS method performs well for areas with large intra-class variability. Toward evaluating the performance of machine learning classifiers, RF outperformed Classification and Regression Trees (CART), Support Vector Machine (SVM), and Relevance Vector Machine (RVM) with a >95% confidence level. The performance of CART and SVM classifiers were found to be similar. RVM achieved good classification results with a limited number of training samples.
Due to its comparatively high spatial resolution and its daily repeat frequency, the tropospheric nitrogen dioxide product provided by the TROPOspheric Monitoring Instrument (TROPOMI) onboard the Sentinel-5 Precursor platform has attracted significant attention for its potential for urban-scale monitoring of air quality. However, the exploitation of such data in, for example, operational assimilation of local-scale dispersion models is often complicated by substantial data gaps due to cloud cover or other retrieval limitations. These challenges are particularly prominent in high-latitude regions where significant cloud cover and high solar zenith angles are often prevalent. Using the example of Norway as a representative case for a high-latitude region, we here evaluate the spatiotemporal patterns in the availability of valid data from the operational TROPOMI tropospheric nitrogen dioxide (NO2) product over five urban areas (Oslo, Bergen, Trondheim, Stavanger, and Kristiansand) and a 2.5 year period from July 2018 through November 2020. Our results indicate that even for relatively clean environments such as small Norwegian cities, distinct spatial patterns of tropospheric NO2 are visible in long-term average datasets from TROPOMI. However, the availability of valid data on a daily level is limited by both cloud cover and solar zenith angle (during the winter months), causing the fraction of valid retrievals in each study site to vary from 20% to 50% on average. A temporal analysis shows that for our study sites and the selected period, the fraction of valid pixels in each domain shows a clear seasonal cycle reaching a maximum of 50% to 75% in the summer months and 0% to 20% in winter. The seasonal cycle in data availability shows the inverse behavior of NO2 pollution in Norway, which typically has its peak in the winter months. However, outside of the mid-winter period we find the TROPOMI NO2 product to provide sufficient data availability for detailed mapping and monitoring of NO2 pollution in the major urban areas in Norway and see potential for the use of the data in local-scale data assimilation and emission inversions applications.
<p>Nitrogen dioxide (NO<sub>2</sub>) is among the major air pollutants in Europe posing severe hazard to environmental and human health. The concentrations of surface NO<sub>2</sub> are measured by ground monitoring stations which are fairly limited in representation and distribution. While NO<sub>2</sub> estimates from chemical transport models are realistic, their complexity makes them computationally intensive. Satellite observations from instruments such as TROPOMI provide high spatiotemporal distribution of NO<sub>2</sub>. However, these instruments capture NO<sub>2</sub> density only along the tropospheric column and not on the surface. Exploiting the availability of ground station measurements and spatially continuous information from TROPOMI, this study estimates surface NO<sub>2</sub> concentrations over Europe at 1km spatial resolution for 2019-2021 using XGBoost machine learning model. While ground measurements are used as target reference features, satellite observations such as tropospheric column density of NO<sub>2</sub> (from TROPOMI), night light radiance (from VIIRS), NDVI (from MODIS) and modelled meteorological parameters such as planetary boundary layer height, wind velocity, temperature are used as input features to the model. We find an overall mean absolute error of 7.87&#181;g/m3, mean bias of -3.13&#181;g/m3 and spearman correlation of 0.61 during model validation. We found that the performance of the model is influenced by NO<sub>2</sub> concentration levels and is most reliable for predictions at concentration levels <40&#181;g/m3 with a relative bias of <40%. The spatial error analysis also indicates the spatial robustness of the model across the study area. The importance of input features is evaluated using SHapley Additive exPlanations (SHAP), which shows TROPOMI NO<sub>2</sub> being the most important source for the modelled NO<sub>2</sub> predictions. Furthermore, SHAP values also highlight the role of VIIRS night light radiance in deriving finer detailed spatial patterns of surface NO<sub>2</sub> estimates. Despite the complex non-linear relationship of the input features, the trained XGBoost model requires an average of 570 seconds to predict single day surface NO<sub>2</sub> concentrations for the large study area of continental scale. Thus, this work evaluates the importance of TROPOMI data and reliability of machine learning models for estimating surface NO<sub>2</sub> concentrations on a larger spatial scale.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.