Abstract. Developing a big data analytics framework for generating the
Long-term Gap-free High-resolution Air Pollutant concentration dataset
(abbreviated as LGHAP) is of great significance for environmental management
and Earth system science analysis. By synergistically integrating multimodal
aerosol data acquired from diverse sources via a tensor-flow-based data
fusion method, a gap-free aerosol optical depth (AOD) dataset with a daily
1 km resolution covering the period of 2000–2020 in China was generated.
Specifically, data gaps in daily AOD imageries from the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard Terra were
reconstructed based on a set of AOD data tensors acquired from diverse
satellites, numerical analysis, and in situ air quality measurements via
integrative efforts of spatial pattern recognition for high-dimensional
gridded image analysis and knowledge transfer in statistical data mining. To
our knowledge, this is the first long-term gap-free high-resolution AOD
dataset in China, from which spatially contiguous PM2.5 and PM10
concentrations were then estimated using an ensemble learning approach.
Ground validation results indicate that the LGHAP AOD data are in good
agreement with in situ AOD observations from the Aerosol Robotic Network (AERONET), with an R of 0.91 and RMSE
equaling 0.21. Meanwhile, PM2.5 and PM10 estimations also
agreed well with ground measurements, with R values of 0.95 and 0.94 and RMSEs of
12.03 and 19.56 µg m−3, respectively. The LGHAP provides a suite
of long-term gap-free gridded maps with a high resolution to better examine
aerosol changes in China over the past 2 decades, from which three major
variation periods of haze pollution in China were revealed. Additionally,
the proportion of the population exposed to unhealthy PM2.5 increased
from 50.60 % in 2000 to 63.81 % in 2014 across China, which was then
reduced drastically to 34.03 % in 2020. Overall, the generated LGHAP
dataset has great potential to trigger multidisciplinary applications in
Earth observations, climate change, public health, ecosystem assessment, and
environmental management. The daily resolution AOD, PM2.5, and
PM10 datasets are publicly available at https://doi.org/10.5281/zenodo.5652257 (Bai et al., 2021a), https://doi.org/10.5281/zenodo.5652265 (Bai et al., 2021b), and https://doi.org/10.5281/zenodo.5652263 (Bai et al., 2021c), respectively.
Monthly and annual datasets can be acquired from https://doi.org/10.5281/zenodo.5655797 (Bai et al., 2021d) and https://doi.org/10.5281/zenodo.5655807 (Bai et al., 2021e), respectively.
Python, MATLAB, R, and IDL codes are also provided to help users read and
visualize these data.