As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.
This article is categorized under:
Commercial, Legal, and Ethical Issues > Fairness in Data Mining
Fundamental Concepts of Data and Knowledge > Data Concepts
Technologies > Data Preprocessing
Taxi is a convenient means of transportation worldwide. Accurately predicting the taxi-demand is crucial for taxi-companies to effectively allocate their fleet to taxi-stands and reduce the waiting time for passengers thus increasing their overall satisfaction and customer retention. Nowadays precise information about taxi-rides is available and can be used to infer the taxi-passenger demand across different locations and time-points. In this paper, we propose an approach for predicting the pick-demand of a given taxi-stand, that takes into account not only the demand-history of the particular stand but it also considers information from neighboring stands. Our model is an LSTM neural network augmented with information from the spatial neighborhood of the stands. Experiments with two versions of the taxi demand dataset from the city of Porto, Portugal show that our approach can provide better predictions comparing to approaches that do not exploit the neighborhood. Keywords: Taxi-passenger demand • Time series prediction • LSTM • k-nearest neighbors • Deep learning • Neural networks
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.