Although algorithms are increasingly used to guide real-world decision-making, their potential for propagating bias remains challenging to measure. A common approach for researchers and practitioners examining algorithms for unintended discriminatory biases is to assess group fairness, which compares outcomes across typically sensitive or protected demographic features like race, gender, or age. In practice, however, data representing these group attributes is often not collected, or may be unavailable due to policy, legal, or other constraints. As a result, practitioners often find themselves tasked with assessing fairness in the face of these missing features. In such cases, they can either forgo a bias audit, obtain the missing data directly, or impute it. Because obtaining additional data is often prohibitively expensive or raises privacy concerns, many practitioners attempt to impute missing data using proxies. Through a survey of the data used in algorithmic fairness literature, which we make public to facilitate future research, we show that when available at all, most publicly available proxy sources are in the form of
summary tables
, which contain only aggregate statistics about a population. Prior work has found that these proxies are not predictive enough on their own to accurately measure group fairness. Even proxy variables that are correlated with group attributes also contain noise (i.e. will predict attributes for a subset of the population effectively at random).
Here, we outline a method for improving accuracy in measuring group fairness using summary tables. Specifically, we propose improving accuracy by focusing only on
highly predictive values
within proxy variables, and outline the conditions under which these proxies can estimate fairness disparities with high accuracy. We then show that a major disqualifying criterion—an association between the proxy and the outcome—can be controlled for using causal inference. Finally, we show that when proxy data is missing altogether, our approach is applicable to rule-based proxies constructed using subject-matter context applied to the original data alone. Crucially, we are able to extract information on group disparities from proxies that may have low discriminatory power at the population level. We illustrate our results through a variety of case studies with real and simulated data. In all, we present a viable method allowing the assessment of fairness in the face of missing data, with limited privacy implications and without needing to rely on complex, expensive, or proprietary data sources.