Abstract. Wireless low-cost particulate matter sensor networks
(WLPMSNs) are transforming air quality monitoring by providing particulate matter (PM)
information at finer spatial and temporal resolutions. However, large-scale WLPMSN calibration and maintenance remain a challenge. The manual labor involved in initial calibration by collocation and routine recalibration is intensive. The transferability of the calibration models determined from initial collocation to new deployment sites is questionable, as calibration factors typically vary with the
urban heterogeneity of operating conditions and aerosol optical properties. Furthermore, the stability of low-cost sensors can drift or degrade over time. This study presents a simultaneous
Gaussian process regression (GPR) and simple linear regression pipeline to
calibrate and monitor dense WLPMSNs on the fly by leveraging all available
reference monitors across an area without resorting to pre-deployment
collocation calibration. We evaluated our method for Delhi, where the
PM2.5 measurements of all 22 regulatory reference and 10 low-cost
nodes were available for 59 d from 1 January to 31 March 2018
(PM2.5 averaged 138±31 µg m−3 among 22 reference
stations), using a leave-one-out cross-validation (CV) over the 22 reference
nodes. We showed that our approach can achieve an overall 30 % prediction
error (RMSE: 33 µg m−3) at a 24 h scale, and it is robust as it is
underscored by the small variability in the GPR model parameters and in the
model-produced calibration factors for the low-cost nodes among the 22-fold
CV. Of the 22 reference stations, high-quality predictions were observed for
those stations whose PM2.5 means were close to the Delhi-wide mean
(i.e., 138±31 µg m−3), and relatively poor predictions were observed for
those nodes whose means differed substantially from the Delhi-wide mean
(particularly on the lower end). We also observed washed-out local
variability in PM2.5 across the 10 low-cost sites after calibration
using our approach, which stands in marked contrast to the true wide
variability across the reference sites. These observations revealed that our
proposed technique (and more generally the geostatistical technique)
requires high spatial homogeneity in the pollutant concentrations to be
fully effective. We further demonstrated that our algorithm performance is
insensitive to training window size as the mean prediction error rate and
the standard error of the mean (SEM) for the 22 reference stations remained
consistent at ∼30 % and ∼3 %–4 %, respectively, when an
increment of 2 d of data was included in the model training. The markedly
low requirement of our algorithm for training data enables the models to
always be nearly the most updated in the field, thus realizing the algorithm's
full potential for dynamically surveilling large-scale WLPMSNs by detecting
malfunctioning low-cost nodes and tracking the drift with little latency.
Our algorithm presented similarly stable 26 %–34 % mean prediction errors
and ∼3 %–7 % SEMs over the sampling period when pre-trained
on the current week's data and predicting 1 week ahead, and therefore it is suitable
for online calibration. Simulations conducted using our algorithm suggest
that in addition to dynamic calibration, the algorithm can also be adapted
for automated monitoring of large-scale WLPMSNs. In these simulations, the
algorithm was able to differentiate malfunctioning low-cost nodes (due to
either hardware failure or under the heavy influence of local sources)
within a network by identifying aberrant model-generated calibration
factors (i.e., slopes close to zero and intercepts close to the Delhi-wide
mean of true PM2.5). The algorithm was also able to track the drift of
low-cost nodes accurately within 4 % error for all the simulation
scenarios. The simulation results showed that ∼20 reference
stations are optimum for our solution in Delhi and confirmed that low-cost
nodes can extend the spatial precision of a network by decreasing the extent
of pure interpolation among only reference stations. Our solution has
substantial implications in reducing the amount of manual labor for the
calibration and surveillance of extensive WLPMSNs, improving the spatial
comprehensiveness of PM evaluation, and enhancing the accuracy of WLPMSNs.