Missing data has been a challenge in air quality measurement. In this study, we develop an input-adaptive proxy, which selects input variables of other air quality variables based on their correlation coefficients with the output variable. The proxy uses ordinary least squares regression model with robust optimization and limits the input variables to a maximum of three to avoid overfitting. The adaptive proxy learns from the data set and generates the best model evaluated by adjusted coefficient of determination (adjR2). In case of missing data in the input variables, the proposed adaptive proxy then uses the second-best model until all the missing data gaps are filled up. We estimated black carbon (BC) concentration by using the input-adaptive proxy in two sites in Helsinki, which respectively represent street canyon and urban background scenario, as a case study. Accumulation mode, traffic counts, nitrogen dioxide and lung deposited surface area are found as input variables in models with the top rank. In contrast to traditional proxy, which gives 20–80% of data, the input-adaptive proxy manages to give full continuous BC estimation. The newly developed adaptive proxy also gives generally accurate BC (street canyon: adjR2 = 0.86–0.94; urban background: adjR2 = 0.74–0.91) depending on different seasons and day of the week. Due to its flexibility and reliability, the adaptive proxy can be further extend to estimate other air quality parameters. It can also act as an air quality virtual sensor in support with on-site measurements in the future.
Air pollution is a contributor to approximately one in every nine deaths annually. Air quality monitoring is being carried out extensively in urban environments. Currently, however, city air quality stations are expensive to maintain resulting in sparse coverage and data is not readily available to citizens. This can be resolved by city-wide participatory sensing of air quality fluctuations using low-cost sensors. We introduce new concepts for participatory sensing: a voluntary community-based monitoring data forum for stakeholders to manage air pollution interventions; an automated system (cyber-physical system) for monitoring outdoor air quality and indoor air quality; programmable platform for calibration and generating virtual sensors using data from low-cost sensors and city monitoring stations. To test our concepts, we developed a low-cost sensor to measure particulate matter (PM2.5), nitrogen dioxide (NO2), carbon monoxide (CO), and ozone (O3) with GPS. We validated our approach in Helsinki, Finland, with participants carrying the sensor for 3 months during six data campaigns between 2019 and 2021. We demonstrate good correspondence between the calibrated low-cost sensor data and city’s monitoring station measurements. Data analysis of their personal exposure was made available to the participants and stored as historical data for later use. Combining the location of low cost sensor data with participants public profile, we generate proxy concentrations for black carbon and lung deposition of particles between districts, by age groups and by the weekday.
<p>Urban air pollution has been a global challenge, and continuous air quality measurement is important to understand the nature of the problem. However, missing data has often been an issue in air quality measurement. In this study, we presented a modified method to impute missing data by input-adaptive proxy. We used black carbon (BC) concentration data in M&#228;kel&#228;nkatu traffic site (TR) and Kumpula urban background site (BG) in Helsinki, Finland in 2017&#8211;2018 as training sets. The input-adaptive proxy selected input variables of other air quality variables based on their Pearson correlation coefficients with BC. In order to avoid overfitting, this proxy used the algorithm of least squares model with a bisquare weighting function and allowed a maximum of three input variables. The generated models were then evaluated and ranked by adjusted coefficient of determination (adjR<sup>2</sup>), mean absolute error and root mean square error. BC concentration was first estimated by the best model. In case of missing data in the input variables in the best model, the input-adaptive proxy then used the second-best model until all the missing data gaps were filled up.</p><p>The input-adaptive proxy managed to fill up 100% of the missing voids while traditional proxy filled only 20&#8211;80% of missing BC data. Furthermore, the overall performance of the input-adaptive proxy is reliable both in TR (adjR<sup>2</sup>=0.86&#8211;0.94) and in BG (adjR<sup>2</sup>=0.74&#8211;0.91). TR has a generally better regression performance because the level of BC can be mostly explained by traffic count, nitrogen oxides and accumulation mode. On the contrary, the source of BC in BG is more heterogeneous, which includes traffic emission and residential combustion, and the concentration of BC is influenced by meteorological parameters; therefore, the rule of including maximum three input variables might lead to the lower adjR<sup>2</sup>. The proxy&#160;works slightly better for workdays scenario than in weekends in both sites. In TR, the&#160;proxy works similarly in all seasons, while in BG, the proxy performance is better in winter and autumn than in the other seasons. The simplicity, full coverage and high reliability of the input-adaptive proxy make it sound to further estimate other air quality parameters. Moreover, it can act as an air quality virtual sensor alongside with on-site instruments.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.