Purpose
Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction.
Design/methodology/approach
To achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day.
Findings
The result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%.
Research limitations/implications
This research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms.
Practical implications
This study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation.
Originality/value
The result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.