The increasing use of new data sources and machine learning models in transport modelling raises concerns with regards to potentially unfair model-based decisions that rely on gender, age, ethnicity, nationality, income, education or other socio-economic and demographic data. We demonstrate the impact of such algorithmic bias and explore the best practices to address it using three different representative supervised learning models of varying levels of complexity. We also analyse how the different kinds of data (survey data vs. big data) could be associated with different levels of bias. The methodology we propose detects the model’s bias and implements measures to mitigate it. Specifically, three bias mitigation algorithms are implemented, one at each stage of the model development pipeline—before the classifier is trained (pre-processing), when training the classifier (in-processing) and after the classification (post-processing). As these debiasing techniques have an inevitable impact on the accuracy of predicting the behaviour of individuals, the comparison of different types of models and algorithms allows us to determine which techniques provide the best balance between bias mitigation and accuracy loss for each case. This approach improves model transparency and provides an objective assessment of model fairness. The results reveal that mode choice models are indeed affected by algorithmic bias, and it is proven that the implementation of off-the-shelf mitigation techniques allows us to achieve fairer classification models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.