This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O 3 ), nitrogen dioxide (NO 2 ) and particulate matter (PM 10 ) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O 3 , 0.80 for NO 2 and 0.74 for PM 10 ). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM 10 and O 3 , hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.
Atmospheric pollutants concentration forecasting is an important issue in air quality monitoring. Qualitair Corse, the organization responsible for monitoring air quality in Corsica (France), needs to develop a short-term prediction model to lead its mission of information towards the public. Various deterministic models exist for local forecasting, but need important computing resources, a good knowledge of atmospheric processes and can be inaccurate because of local climatical or geographical particularities, as observed in Corsica, a mountainous island located in the Mediterranean Sea. As a result, we focus in this study on statistical models, and particularly Artificial Neural Networks (ANNs) that have shown good results in the prediction of ozone concentration one hour ahead with data measured locally. The purpose of this study is to build a predictor realizing predictions of ozone 24 hours ahead in Corsica in order to be able to anticipate pollution peaks formation and to take appropriate preventive measures. Specific meteorological conditions are known to lead to particular pollution event in Corsica (e.g. Saharan dust events). Therefore, an ANN model will be used with pollutant and meteorological data for operational forecasting. Index of agreement of this model was calculated with a one year test dataset and reached 0.88.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.