Within the last years Face Recognition (FR) systems have achieved human-like (or better) performance, leading to extensive deployment in large-scale practical settings. Yet, especially for sensible domains such as FR we expect algorithms to work equally well for everyone, regardless of somebody's age, gender, skin colour and/or origin. In this paper, we investigate a methodology to quantify the amount of bias in a trained Convolutional Neural Network (CNN) model for FR that is not only intuitively appealing, but also has already been used in the literature to argue for certain debiasing methods. It works by measuring the "blindness" of the model towards certain face characteristics in the embeddings of faces based on internal cluster validation measures. We conduct experiments on three openly available FR models to determine their bias regarding race, gender and age, and validate the computed scores by comparing their predictions against the actual drop in face recognition performance for minority cases. Interestingly, we could not link a crisp clustering in the embedding space to a strong bias in recognition rates-it is rather the opposite. We therefore offer arguments for the reasons behind this observation and argue for the need of a less naïve clustering approach to develop a working measure for bias in FR models.
The morphological assessment of anatomical structures is clinically relevant, but often falls short of quantitative or standardised criteria. Whilst human observers are able to assess morphological characteristics qualitatively, the development of robust shape features remains challenging. In this study, we employ psychometric and radiomic methods to develop quantitative models of the perceived irregularity of intracranial aneurysms (IAs). First, we collect morphological characteristics (e.g. irregularity, asymmetry) in imaging-derived data and aggregated the data using rank-based analysis. Second, we compute regression models relating quantitative shape features to the aggregated qualitative ratings (ordinal or binary). We apply our method for quantifying perceived shape irregularity to a dataset of 134 IAs using a pool of 179 different shape indices. Ratings given by 39 participants show good agreement with the aggregated ratings (Spearman rank correlation Sp=0.84). The best-performing regression model based on quantitative shape features predicts the perceived irregularity with R2:0.84±0.05.
Face Recognition (FR) is increasingly influencing our lives: we use it to unlock our phones; police uses it to identify suspects. Two main concerns are associated with this increase in facial recognition: (1) the fact that these systems are typically less accurate for marginalized groups, which can be described as “bias”, and (2) the increased surveillance through these systems. Our paper is concerned with the first issue. Specifically, we explore an intuitive technique for reducing this bias, namely “blinding” models to sensitive features, such as gender or race, and show why this cannot be equated with reducing bias. Even when not designed for this task, facial recognition models can deduce sensitive features, such as gender or race, from pictures of faces—simply because they are trained to determine the “similarity” of pictures. This means that people with similar skin tones, similar hair length, etc. will be seen as similar by facial recognition models. When confronted with biased decision-making by humans, one approach taken in job application screening is to “blind” the human decision-makers to sensitive attributes such as gender and race by not showing pictures of the applicants. Based on a similar idea, one might think that if facial recognition models were less aware of these sensitive features, the difference in accuracy between groups would decrease. We evaluate this assumption—which has already penetrated into the scientific literature as a valid de-biasing method—by measuring how “aware” models are of sensitive features and correlating this with differences in accuracy. In particular, we blind pre-trained models to make them less aware of sensitive attributes. We find that awareness and accuracy do not positively correlate, i.e., that bias$$\ne$$
≠
awareness. In fact, blinding barely affects accuracy in our experiments. The seemingly simple solution of decreasing bias in facial recognition rates by reducing awareness of sensitive features does thus not work in practice: trying to ignore sensitive attributes is not a viable concept for less biased FR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.